SiteCrawler module¶
Crawl a site and optionally perform snapshots of the pages.
-
class
SiteCrawler.
SiteCrawler
(urls=None, output_dir=None, template_dir=None, page_template=None, mode=None)¶ Bases:
object
Crawl the site and save snapshots if instructed to.
- Parameters
urls (list) – list of URLs to crawl
output_dir (str) – directory to output snapshots (default: “output”)
template_dir (str) – directory to containing Jinja2 templates (default: “templates”)
mode (str) – crawler mode: quick or snapshot (default: “quick”)
-
crawl_site
()¶ Crawl the site and perform a snapshot.
-
get_mode
() → str¶ Get the setting of quick mode.
- Returns
type of crawl configured for use
- Return type
str
-
get_output_dir
() → str¶ Get the name of the output directory.
- Returns
name of the output directory
- Return type
str
-
get_page_template
() → str¶ Get the name of the Jinja2 page template for snapshots.
- Returns
name of the Jinja2 “page” template
- Return type
str
-
get_template_dir
() → str¶ Get the name of the directory containing the Jinja2 templates.
- Returns
name of the directory containing the Jinja2 templates for screenshots
- Return type
str
-
get_urls
() → list¶ Get URLs contained in the class.
- Returns
list of URLs contained within the urls attribute
- Return type
list