Modern Text Mining Framework for R
Converts document-term matrix sparse matrix to 'lda_c' format
BNS
Checks accuracy of word embeddings on the analogy task
Coherence metrics for topic models
Collocations model.
Combines multiple vocabularies into one
Document-term matrix construction
Term-co-occurence matrix construction
Creates a vocabulary of unique terms
Pairwise Distance Matrix Computation
re-export rsparse::GloVe
Creates iterator over text files from the disk
Iterators (and parallel iterators) over input objects
(numerically robust) Dimension reduction via Jensen-Shannon Divergence...
Creates Latent Dirichlet Allocation model.
Latent Semantic Analysis model
Matrix normalization
Perplexity of a topic model
Prepares list of analogy questions
Printing Vocabulary
Prune vocabulary
Objects exported from other packages
Creates Relaxed Word Movers Distance (RWMD) model
Pairwise Similarity Matrix Computation
Split a vector for parallel processing
text2vec
TfIdf
Simple tokenization functions for string splitting
Vocabulary and hash vectorizers
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.