text2vec R package [Documentation]

as.lda_c

Converts document-term matrix sparse matrix to 'lda_c' format

BNS

check_analogy_accuracy

Checks accuracy of word embeddings on the analogy task

coherence

Coherence metrics for topic models

Collocations

Collocations model.

combine_vocabularies

Combines multiple vocabularies into one

create_dtm

Document-term matrix construction

create_tcm

Term-co-occurence matrix construction

create_vocabulary

Creates a vocabulary of unique terms

distances

Pairwise Distance Matrix Computation

GloVe

re-export rsparse::GloVe

ifiles

Creates iterator over text files from the disk

itoken

Iterators (and parallel iterators) over input objects

jsPCA_robust

(numerically robust) Dimension reduction via Jensen-Shannon Divergence...

LatentDirichletAllocation

Creates Latent Dirichlet Allocation model.

LatentSemanticAnalysis

Latent Semantic Analysis model

normalize

Matrix normalization

perplexity

Perplexity of a topic model

prepare_analogy_questions

Prepares list of analogy questions

print.text2vec_vocabulary

Printing Vocabulary

prune_vocabulary

Prune vocabulary

reexports

Objects exported from other packages

RelaxedWordMoversDistance

Creates Relaxed Word Movers Distance (RWMD) model

similarities

Pairwise Similarity Matrix Computation

split_into

Split a vector for parallel processing

text2vec

TfIdf

tokenizers

Simple tokenization functions for string splitting

vectorizers

Vocabulary and hash vectorizers

Download source package Read PDF manual

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Maintainer: Dmitriy Selivanov
License: GPL (>= 2) | file LICENSE
Last published: 2023-11-09

Useful links

text2vec0.6.4 package

Functions

Readme

Datasets

Dependencies

Imports

Versions

News