Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Bind the term frequency and inverse document frequency of a tidy text ...
Create a sparse matrix from row names, column names, and values in a t...
Tidiers for a corpus object from the quanteda package
Tidy dictionary objects from the quanteda package
Casting a data frame to a DocumentTermMatrix, TermDocumentMatrix, or d...
Get a tidy data frame of a single sentiment lexicon
Get a tidy data frame of a single stopword lexicon
Tidiers for LDA and CTM objects from the topicmodels package
Tidiers for Latent Dirichlet Allocation models from the mallet package
Objects exported from other packages
Reorder an x or y axis within facets
Tidiers for Structural Topic Models from the stm package
Tidy DocumentTermMatrix, TermDocumentMatrix, and related objects from ...
Tidy a Corpus object from the tm package
Utility function to tidy a simple triplet matrix
tidytext: Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools
Wrapper around unnest_tokens for characters and character shingles
Wrapper around unnest_tokens for n-grams
Wrapper around unnest_tokens for Penn Treebank Tokenizer
Wrapper around unnest_tokens for regular expressions
Wrapper around unnest_tokens for sentences, lines, and paragraphs
Split a column into tokens
Wrapper around unnest_tokens for tweets
Using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. Much of the infrastructure needed for text mining with tidy data frames already exists in packages like 'dplyr', 'broom', 'tidyr', and 'ggplot2'. In this package, we provide functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.
Useful links