R Tools for Text Matrices, Embeddings, and Networks
Calculate Concept Mover's Distance
Performs Concept Class Analysis (CoCA)
Find a specified document centrality metric
Find a similarities between documents
A fast unigram DTM builder
Melt a DTM into a triplet data frame
Resamples an input DTM to generate new DTMs
Gets DTM summary statistics
Removes terms from a DTM based on rules
Find the 'projection matrix' to a semantic vector
Find the 'rejection matrix' from a semantic vector
Find a specified matrix transformation
Gets anchor terms from precompiled anchor lists
Word embedding semantic centroid extractor
Word embedding semantic direction extractor
Word embedding semantic region extractor
Gets stoplist from precompiled lists
Import Matrix
Monte Carlo Permutation Tests for Model P-Values
Plot CoCA
Prints CoCA class information
Build a Random Corpus
Build Multiple Random Corpora
Represent Documents as Token-Integer Sequences
Evaluate anchor sets in defining semantic directions
Text2Map
A very tiny "gender" tagger
A fast unigram vocabulary builder
This is a collection of functions optimized for working with with various kinds of text matrices. Focusing on the text matrix as the primary object - represented either as a base R dense matrix or a 'Matrix' package sparse matrix - allows for a consistent and intuitive interface that stays close to the underlying mathematical foundation of computational text analysis. In particular, the package includes functions for working with word embeddings, text networks, and document-term matrices. Methods developed in Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>, Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>, Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.
Useful links