Parse and Test Robots Exclusion Protocol Files and Rules
Test URL paths against a robxp
robots.txt
object
Retrieve all agent crawl delay values in a robxp
robots.txt
object
Custom printer for `robxp`` objects
Parse a robots.txt
file & create a robxp
object
Retrieve a character vector of sitemaps from a parsed robots.txt objec...
Parse and Test Robots Exclusion Protocol Files and Rules
The 'Robots Exclusion Protocol' <https://www.robotstxt.org/orig.html> documents a set of standards for allowing or excluding robot/spider crawling of different areas of site content. Tools are provided which wrap The 'rep-cpp' <https://github.com/seomoz/rep-cpp> C++ library for processing these 'robots.txt' files.