boilerpipeR1.3.2 package

Interface to the Boilerpipe Java Library

Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe <https://github.com/kohlschutter/boilerpipe> Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.

  • Maintainer: Mario Annau
  • License: Apache License (== 2.0)
  • Last published: 2021-05-19