Natural Language Processing - CRAN Task View

boilerpipeR

Interface to the Boilerpipe Java Library

Version 1.3.2

BTM

Biterm Topic Models for Short Text

Version 0.3.7

corpora

Statistics and Data Sets for Corpus Frequency Data

Version 0.6

crfsuite

Conditional Random Fields for Labelling Sequential Data in Natural Lan...

Version 0.4.2

keyperm

Keyword Analysis Using Permutation Tests

Version 0.1.1

gsubfn

Utilities for Strings and Function Arguments

Version 0.7

jiebaR

Chinese Text Segmentation

Version 0.11

koRpus

Text Analysis with Emphasis on POS Tagging, Readability, and Lexical D...

Version 0.13-8

languageR

Analyzing Linguistic Data: A Practical Introduction to Statistics

Version 1.5.0

lda

Collapsed Gibbs Sampling Methods for Topic Models

Version 1.5.2

lsa

Latent Semantic Analysis

Version 0.73.3

movMF

Mixtures of von Mises-Fisher Distributions

Version 0.2-8

mscstexta4r

R Client for the Microsoft Cognitive Services Text Analytics REST API

Version 0.1.2

openNLP

Apache OpenNLP Tools Interface

Version 0.2-7

tesseract

Open Source OCR Engine

Version 5.2.1

tm.plugin.alceste

Import Texts from Files in the 'Alceste' Format Using the 'tm' Text Mi...

Version 1.1.1

ruimtehol

Learn Text 'Embeddings' with 'Starspace'

Version 0.3.2

stringdist

Approximate String Matching, Fuzzy Text Search, and String Distance Fu...

Version 0.9.12

topicmodels

Topic Models

Version 0.2-17

stm

Estimation of the Structural Topic Model

Version 1.3.7

tau

Text Analysis Utilities

Version 0.0-25

text2vec

Modern Text Mining Framework for R

Version 0.6.4

word2vec

Distributed Representations of Words

Version 0.4.0

skmeans

Spherical k-Means Clustering

Version 0.2-17

mscsweblm4r

R Client for the Microsoft Cognitive Services Web Language Model REST ...

Version 0.1.2

SnowballC

Snowball Stemmers Based on the C 'libstemmer' UTF-8 Library

Version 0.7.1

RWeka

R/Weka Interface

Version 0.4-46

textcat

N-Gram Based Text Categorization

Version 1.0-8

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Pa...

Version 0.8.11

tokenizers

Fast, Consistent Tokenization of Natural Language Text

Version 0.3.0

textplot

Text Plots

Version 0.2.2

topicdoc

Topic-Specific Diagnostics for LDA and CTM Topic Models

Version 0.1.1

sentometrics

An Integrated Framework for Textual Sentiment Time Series Aggregation ...

Version 1.0.0

tm.plugin.dc

Text Mining Distributed Corpus Plug-in

Version 0.2-10

textrank

Summarize Text by Ranking Sentences and Finding Keywords

Version 0.3.1

zipfR

Statistical Models for Word Frequency Distributions

Version 0.6-70

textreuse

Detect Text Reuse and Document Similarity

Version 0.1.5

tm.plugin.factiva

Import Articles from 'Factiva' Using the 'tm' Text Mining Framework

Version 1.8

tm.plugin.lexisnexis

Import Articles from 'LexisNexis' Using the 'tm' Text Mining Framework

Version 1.4.1

wordcloud

Word Clouds

Version 2.6

RcmdrPlugin.temis

Graphical Integrated Text Mining Solution

Version 0.7.10

textir

Inverse Regression for Text Analysis

Version 2.0-5

tm.plugin.europresse

Import Articles from 'Europresse' Using the 'tm' Text Mining Framework

Version 1.4

RKEA

R/KEA Interface

Version 0.0-6

tidytext

Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

Version 0.4.2

ore

An R Interface to the Onigmo Regular Expression Library

Version 1.7.4.1

tokenizers.bpe

Byte Pair Encoding Text Tokenization

Version 0.1.3

qdap

Bridging the Gap Between Qualitative Data and Quantitative Analysis

Version 2.4.6

sentencepiece

Text Tokenization using Byte Pair Encoding and Unigram Modelling

Version 0.2.3

sentiment.ai

Simple Sentiment Analysis Using Deep Learning

Version 0.1.1

phonics

Phonetic Spelling Algorithms

Version 1.3.10

wordnet

WordNet Interface

Version 0.1-17

stringi

Fast and Portable Character String Processing Facilities

Version 1.8.4

svs

Tools for Semantic Vector Spaces

Version 3.1.1

kernlab

Kernel-Based Machine Learning Lab

Version 0.9-33

tm

Text Mining Package

Version 0.7-14

tm.plugin.mail

Text Mining E-Mail Plug-in

Version 0.3-1

hunspell

High-Performance Stemmer, Tokenizer, and Spell Checker

Version 3.0.4

corporaexplorer

A 'Shiny' App for Exploration of Text Collections

Version 0.9.0

quanteda

Quantitative Analysis of Textual Data

Version 4.1.0