Managing, Querying and Analyzing Tokenized Text
Choose and add multitoken strings based on multitoken categories
Helper function for aggregate_rsyntax
Aggregate the tokens data
Aggregate rsyntax annotations
Force an object to be a tCorpus class
Force an object to be a tCorpus class
Force an object to be a tCorpus class
Extract the backbone of a network.
View hits in a browser
Create and view a full text browser
Vectorized computation of chi^2 statistic for a 2x2 crosstab containin...
Compare tCorpus vocabulary to that of another (reference) tCorpus
Calculate the similarity of documents
Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus
Count results of search hits, or of a given feature in tokens
Create a tCorpus
Support function for subset method
Compare two document term matrices
Plot a word cloud from a dtm
Create an ego network
Export span annotations
Get common nearby features given a query or query hits
Feature statistics
Fold rsyntax annotations
Support function for subset method
Create a document term matrix.
Compute global feature positions
Get keyword-in-context (KWIC) strings
Get a character vector of stopwords
Laplace (i.e. add constant) smoothing
Convert a quanteda dictionary to a long data.table format
Merge tCorpus objects
Visualize a semnet network
Plot a wordcloud with words ordered and coloured according to a dimens...
S3 plot for contextHits class
visualize feature associations
S3 plot for featureHits class
visualize vocabularyComparison
Preprocess tokens in a character vector
S3 print for contextHits class
S3 print for featureHits class
S3 print for tCorpus class
Refresh a tCorpus object using the current version of corpustools
Check if package with given version exists
Search for documents or sentences using Boolean queries
Dictionary lookup
Find tokens using a Lucene-like search query
Create a semantic network based on the co-occurence of tokens in token...
Create a semantic network based on the co-occurence of tokens in docum...
Set some default network attributes for pretty plotting
Simple Good Turing smoothing
Show the names of udpipe models
Subset tCorpus token data using a query
S3 subset for tCorpus class
S3 summary for contextHits class
S3 summary for featureHits class
Summary of a tCorpus object
Visualize a dependency tree
Corpus comparison
Creating a tCorpus
Methods and functions for viewing, modifying and subsetting tCorpus da...
Document similarity
Preprocessing, subsetting and analyzing features
Modify tCorpus by reference
Use Boolean queries to analyze the tCorpus
Feature co-occurrence based semantic network analysis
Topic modeling
Annotate tokens based on rsyntax queries
Dictionary lookup
Code features in a tCorpus based on a search string
Get a context vector
Deduplicate documents
Delete column from the data and meta data
Cast the "feats" column in UDpipe tokens to columns
Filter features
Fold rsyntax annotations
Access the data from a tCorpus
Estimate a LDA topic model
Merge the token and meta data.tables of a tCorpus with another data.fr...
Preprocess feature
Replace tokens with dictionary match
Recode features in a tCorpus based on a search string
Change levels of factor columns
Change column names of data and meta data
Modify the token and meta data.tables of a tCorpus
Subset tCorpus token data using a query
Subset a tCorpus
Add columns indicating who did what
Add columns indicating who said what
tCorpus: a corpus class for tokenized texts
Create a tcorpus based on tokens (i.e. preprocessed texts)
Gives the window in which a term occured in a matrix.
Show top features
Apply rsyntax transformations
Get a list of tqueries for extracting who did what
Get a list of tqueries for extracting quotes
Simplify tokenIndex created with the udpipe parser
Get a list of tqueries for finding candidates for span quotes.
Create a tCorpus using udpipe
Reconstruct original texts
Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.