quanteda R package [Documentation]

apply_if

Modify only documents matching a logical condition

as.character.corpus

Coercion and checking methods for corpus objects

as.data.frame.dfm

Convert a dfm to a data.frame

as.dfm

Coercion and checking functions for dfm objects

as.dictionary

Coercion and checking functions for dictionary objects

as.fcm

Coercion and checking functions for fcm objects

as.matrix.dfm

Coerce a dfm to a matrix or data.frame

as.tokens

Coercion, checking, and combining functions for tokens objects

as.yaml

Convert quanteda dictionary objects to the YAML format

attributes-set

Function extending base::attributes()

bootstrap_dfm

Bootstrap a dfm

cbind.dfm

Combine dfm objects by Rows or Columns

char_select

Select or remove elements from a character vector

char_tolower

Convert the case of character objects

check_class

Check object class for functions

check_dots

Check arguments passed to other functions via ...

check_integer

Validate input vectors

concat

Return the concatenator character from an object

convert-wrappers

Convenience wrappers for dfm convert

convert

Convert quanteda objects to non-quanteda formats

corpus_group

Combine documents in corpus by a grouping variable

corpus_reshape

Recast the document units of a corpus

corpus_sample

Randomly sample documents from a corpus

corpus_segment

Segment texts on a pattern match

corpus_subset

Extract a subset of a corpus

corpus_trim

Remove sentences based on their token lengths or a pattern match

corpus-class

Base method extensions for corpus objects

corpus

Construct a corpus object

data-internal

Internal data sets

data-relocated

Formerly included data objects

dfm_compress

Recombine a dfm or fcm by combining identical dimension elements

dfm_group

Combine documents in a dfm by a grouping variable

dfm_lookup

Apply a dictionary to a dfm

dfm_match

Match the feature set of a dfm to given feature names

dfm_replace

Replace features in dfm

dfm_sample

Randomly sample documents from a dfm

dfm_select

Select features from a dfm or fcm

dfm_sort

Sort a dfm by frequency of one or more margins

dfm_subset

Extract a subset of a dfm

dfm_tfidf

Weight a dfm by tf-idf

dfm_tolower

Convert the case of the features of a dfm and combine

dfm_trim

Trim a dfm using frequency threshold-based feature selection

dfm_weight

Weight the feature frequencies in a dfm

dfm-class

Virtual class "dfm" for a document-feature matrix

dfm-internal

Internal functions for dfm objects

dfm

Create a document-feature matrix

dfm2lsa

Convert a dfm to an lsa "textmatrix"

dictionary-class

dictionary class objects and functions

dictionary

Create a dictionary

docfreq

Compute the (weighted) document frequency of a feature

docnames

Get or set document names

docvars

Get or set document-level variables

escape_regex

Internal function for select_types() to escape regular expressions

expand

Simpler and faster version of expand.grid() in base package

fcm_sort

Sort an fcm in alphabetical order of the features

fcm-class

Virtual class "fcm" for a feature co-occurrence matrix

fcm

Create a feature co-occurrence matrix

featfreq

Compute the frequencies of features

featnames

Get the feature labels from a dfm

field_system

Shortcut functions to access or assign metadata

flatten_dictionary

Flatten a hierarchical dictionary into a list of character vectors

flatten_list

Internal function to flatten a nested list

format_sparsity

format a sparsity value for printing

get_docvars

Internal function to extract docvars

get_object_version

Get the package version that created an object

groups

Grouping variable(s) for various functions

head.dfm

Return the first or last part of a dfm

index

Locate a pattern in a tokens object

info_tbb

Get information on TBB library

is_glob

Check if patterns contains glob wildcard

is_indexed

Check if a glob pattern is indexed by index_types

is_regex

Check if a string is a regular expression

is.collocations

Check if an object is collocations

kwic

Locate keywords-in-context

list2dictionary

Internal function to convert a list to a dictionary

lowercase_dictionary_values

Internal function to lowercase dictionary values

make_docvars

Internal function to make new system-level docvars

make_meta

Internal functions to create a list of the meta fields

matrix2dfm

Converts a Matrix to a dfm

matrix2fcm

Converts a Matrix to a fcm

merge_dictionary_values

Internal function to merge values of duplicated keys

message_dfm

Print messages in dfm methods

message_error

Return an error message

message_tokens

Print messages in tokens methods

messages

Message parameter documentation

meta_system

Internal function to get, set or initialize system metadata

msg

Conditionally format messages

names-quanteda

Special handling for names of quanteda objects

ndoc

Count the number of documents or features

nest_dictionary

Utility function to generate a nested list

nsentence

Count the number of sentences

ntoken

Count the number of tokens or types

object-builders

Object builders

object2id

Match quanteda objects against token types

pattern

Pattern for feature, token and keyword matching

pattern2id

Match patterns against token types

phrase

Declare a pattern to be a sequence of separate patterns

pipe

Pipe operator

print-methods

Print methods for quanteda core objects

print.phrases

Print a phrase object

quanteda_options

Get or set package options for quanteda

quanteda-package

An R package for the quantitative analysis of textual data

read_dict_functions

Internal functions to import dictionary files

reexports

Objects exported from other packages

remove_empty_keys

Utility function to remove empty keys

replace_dictionary_values

Internal function to replace dictionary values

resample

Sample a vector

reshape_docvars

Internal function to subset or duplicate docvar rows

search_glob

Select types without performing slow regex search

search_index

Internal function for select_types to search the index using fastmat...

serialize_tokens

Function to serialize list-of-character tokens

set_dfm_dimnames

Internal functions to set dimnames

spacyr-methods

Extensions for and from spacy_parse objects

sparsity

Compute the sparsity of a document-feature matrix

split_values

Internal function for special handling of multi-word dictionary values

summary_metadata

Functions to add or retrieve corpus summary metadata

summary.corpus

Summarize a corpus

textmodels

Models for scaling and classification of textual data

textplots

Plots for textual data

texts

Get or assign corpus texts [deprecated]

textstats

Statistics for textual data

tokenize_custom

Customizable tokenizer

tokenize_internal

quanteda tokenizers

tokens_chunk

Segment tokens object by chunks of a given size

tokens_compound

Convert token sequences into compound tokens

tokens_group

Combine documents in a tokens object by a grouping variable

tokens_lookup

Apply a dictionary to a tokens object

tokens_ngrams

Create n-grams and skip-grams from tokens

tokens_recompile

recompile a serialized tokens object

tokens_replace

Replace tokens in a tokens object

tokens_restore

Restore special tokens

tokens_sample

Randomly sample documents from a tokens object

tokens_segment

Segment tokens object by patterns

tokens_select

Select or remove tokens from a tokens object

tokens_split

Split tokens by a separator pattern

tokens_subset

Extract a subset of a tokens

tokens_tolower

Convert the case of tokens

tokens_trim

Trim tokens using frequency threshold-based feature selection

tokens_wordstem

Stem the terms in an object

tokens_xptr

Methods for tokens_xptr objects

tokens-class

Base method extensions for tokens objects

tokens

Construct a tokens object

topfeatures

Identify the most frequent features in a dfm

types

Get word types from a tokens object

unlist_character

Unlist a list of character vectors safely

unlist_integer

Unlist a list of integer vectors safely

valuetype

Pattern matching using valuetype

Download source package Read PDF manual

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintainer: Kenneth Benoit
License: GPL-3
Last published: 2025-01-08

Useful links

quanteda4.2.0 package

Functions

Readme

Datasets

Dependencies

Imports

Versions

News