A Tidy Data Model for Natural Language Processing
cleanNLP: A Tidy Data Model for Natural Language Processing
Run the annotation pipeline on a set of documents
Download model files needed for spacy
Interface for initializing the spacy backend
Interface for initializing the standard R backend
Interface for initializing the udpipe backend
Compute Principal Components and store as a Data Frame
Construct the TF-IDF Matrix from Annotation or Data Frame
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or a Python back ends with 'spaCy' <https://spacy.io>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.