A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker
Method as.list() for class robotstxt_text
fix_url
storage for http request response objects
downloading robots.txt file
function to get multiple robotstxt files
function guessing domain from path
http_domain_changed
http_subdomain_changed
http_was_redirected
is_suspect_robotstxt
function that checks if file is valid / parsable robots.txt file
Merge a number of named lists in sequential order
make automatically named list
null_to_defeault
function parsing robots.txt
parse_url
paths_allowed_worker spiderbar flavor
check if a bot has permissions to access page(s)
re-export magrittr pipe operator
printing robotstxt_text
printing robotstxt
function to remove domain from path
request_handler_handler
Generate a representations of a robots.txt file
get_robotstxt() cache
extracting comments from robots.txt
extracting robotstxt fields
extracting permissions from robots.txt
load robots.txt files saved along with the package
extracting HTTP useragents from robots.txt
list robots.txt files saved along with the package
rt_request_handler
making paths uniform
Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, crawler, scrapers, ...) are allowed to access specific resources on a domain.
Useful links