tosca0.3-4 package

Tools for Statistical Content Analysis

as.corpus.textmeta

Transform textmeta to corpus

as.meta

"meta" Component of "textmeta"-Objects

as.textmeta.corpus

Transform corpus to textmeta

cleanTexts

Data Preprocessing

clusterTopics

Cluster Analysis

deleteAndRenameDuplicates

Deletes and Renames Articles with the same ID

duplist

Creating List of Duplicates

filterCount

Subcorpus With Count Filter

filterDate

Subcorpus With Date Filter

filterID

Subcorpus With ID Filter

filterWord

Subcorpus With Word Filter

intruderTopics

Function to validate the fit of the LDA model

intruderWords

Function to validate the fit of the LDA model

LDAgen

Function to fit LDA model

LDAprep

Create Lda-ready Dataset

makeWordlist

Counts Words in Text Corpora

mergeLDA

Preparation of Different LDAs For Clustering

mergeTextmeta

Merge Textmeta Objects

plotArea

Plotting topics over time as stacked areas below plotted lines.

plotFreq

Plotting Counts of specified Wordgroups over Time (relative to Corpus)

plotHeat

Plotting Topics over Time relative to Corpus

plotScot

Plots Counts of Documents or Words over Time (relative to Corpus)

plotTopic

Plotting Counts of Topics over Time (Relative to Corpus)

plotTopicWord

Plotting Counts of Topics-Words-Combination over Time (Relative to Wor...

plotWordpt

Plots Counts of Topics-Words-Combination over Time (Relative to Topics...

plotWordSub

Plotting Counts/Proportion of Words/Docs in LDA-generated Topic-Subcor...

precisionRecall

Precision and Recall

readTextmeta

Read Corpora as CSV

readWhatsApp

Read WhatsApp files

readWiki

Read Pages from Wikipedia

readWikinews

Read files from Wikinews

removeXML

Removes XML/HTML Tags and Umlauts

sampling

Sample Texts

showMeta

Export Readable Meta-Data of Articles.

showTexts

Exports Readable Text Lists

textmeta

"textmeta"-Objects

tidy.textmeta

Transform textmeta to an object with tidy text data

topicCoherence

Calculating Topic Coherence

topicsInText

Coloring the words of a text corresponding to topic allocation

topTexts

Get The IDs Of The Most Representive Texts

topWords

Top Words per Topic

A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.

  • Maintainer: Lars Koppers
  • License: GPL (>= 2)
  • Last published: 2025-04-22