SitemapReader module¶
Read and parse through a sitemap and return the data.
-
class
SitemapReader.
SitemapReader
(sitemap_url, sitemap_data={})¶ Bases:
object
Sitemap reader class.
- Parameters
sitemap_url (str) – URL to the primary sitemap
sitemap_data (
dict
, optional) – Data from a parsed sitemap, by default{}
-
sitemap_url
¶ URL to the primary sitemap
- Type
str
-
sitemap_data
¶ Data from a parsed sitemap, by default
{}
- Type
dict
-
get_sitemap_data
() → dict¶ Getter for the parsed sitemap data.
- Returns
- The URL crawled is the key, and the “lastmod” data taken
from the sitemap is the value
- Return type
dict
-
get_sitemap_url
() → str¶ Getter for the sitemap_url.
- Returns
URL to the sitemap
- Return type
str
-
parse_sitemap
() → bool¶ Process the data within the sitemap.
- Returns
True if sitemap was successfully parsed
- Return type
bool
-
print_stats
() → None¶ Print the number of URLs found in the sitemaps.