SitemapReader module¶
Read and parse through a sitemap and return the data.
-
class
SitemapReader.SitemapReader(sitemap_url, sitemap_data={})¶ Bases:
objectSitemap reader class.
- Parameters
sitemap_url (str) – URL to the primary sitemap
sitemap_data (
dict, optional) – Data from a parsed sitemap, by default{}
-
sitemap_url¶ URL to the primary sitemap
- Type
str
-
sitemap_data¶ Data from a parsed sitemap, by default
{}- Type
dict
-
get_sitemap_data() → dict¶ Getter for the parsed sitemap data.
- Returns
- The URL crawled is the key, and the “lastmod” data taken
from the sitemap is the value
- Return type
dict
-
get_sitemap_url() → str¶ Getter for the sitemap_url.
- Returns
URL to the sitemap
- Return type
str
-
parse_sitemap() → bool¶ Process the data within the sitemap.
- Returns
True if sitemap was successfully parsed
- Return type
bool
-
print_stats() → None¶ Print the number of URLs found in the sitemaps.