SiteCrawler module

Crawl a site and optionally perform snapshots of the pages.

class SiteCrawler.SiteCrawler(urls=None, output_dir=None, template_dir=None, page_template=None, mode=None)

Bases: object

Crawl the site and save snapshots if instructed to.

Parameters
  • urls (list) – list of URLs to crawl

  • output_dir (str) – directory to output snapshots (default: “output”)

  • template_dir (str) – directory to containing Jinja2 templates (default: “templates”)

  • mode (str) – crawler mode: quick or snapshot (default: “quick”)

crawl_site()

Crawl the site and perform a snapshot.

get_mode() → str

Get the setting of quick mode.

Returns

type of crawl configured for use

Return type

str

get_output_dir() → str

Get the name of the output directory.

Returns

name of the output directory

Return type

str

get_page_template() → str

Get the name of the Jinja2 page template for snapshots.

Returns

name of the Jinja2 “page” template

Return type

str

get_template_dir() → str

Get the name of the directory containing the Jinja2 templates.

Returns

name of the directory containing the Jinja2 templates for screenshots

Return type

str

get_urls() → list

Get URLs contained in the class.

Returns

list of URLs contained within the urls attribute

Return type

list