SitemapReader module

Read and parse through a sitemap and return the data.

class SitemapReader.SitemapReader(sitemap_url, sitemap_data={})

Bases: object

Sitemap reader class.

Parameters
  • sitemap_url (str) – URL to the primary sitemap

  • sitemap_data (dict, optional) – Data from a parsed sitemap, by default {}

sitemap_url

URL to the primary sitemap

Type

str

sitemap_data

Data from a parsed sitemap, by default {}

Type

dict

get_sitemap_data() → dict

Getter for the parsed sitemap data.

Returns

The URL crawled is the key, and the “lastmod” data taken

from the sitemap is the value

Return type

dict

get_sitemap_url() → str

Getter for the sitemap_url.

Returns

URL to the sitemap

Return type

str

parse_sitemap() → bool

Process the data within the sitemap.

Returns

True if sitemap was successfully parsed

Return type

bool

print_stats() → None

Print the number of URLs found in the sitemaps.