Tools for Statistical Content Analysis
Transform textmeta to corpus
"meta" Component of "textmeta"-Objects
Transform corpus to textmeta
Data Preprocessing
Cluster Analysis
Deletes and Renames Articles with the same ID
Creating List of Duplicates
Subcorpus With Count Filter
Subcorpus With Date Filter
Subcorpus With ID Filter
Subcorpus With Word Filter
Function to validate the fit of the LDA model
Function to validate the fit of the LDA model
Function to fit LDA model
Create Lda-ready Dataset
Counts Words in Text Corpora
Preparation of Different LDAs For Clustering
Merge Textmeta Objects
Plotting topics over time as stacked areas below plotted lines.
Plotting Counts of specified Wordgroups over Time (relative to Corpus)
Plotting Topics over Time relative to Corpus
Plots Counts of Documents or Words over Time (relative to Corpus)
Plotting Counts of Topics over Time (Relative to Corpus)
Plotting Counts of Topics-Words-Combination over Time (Relative to Wor...
Plots Counts of Topics-Words-Combination over Time (Relative to Topics...
Plotting Counts/Proportion of Words/Docs in LDA-generated Topic-Subcor...
Precision and Recall
Read Corpora as CSV
Read WhatsApp files
Read Pages from Wikipedia
Read files from Wikinews
Removes XML/HTML Tags and Umlauts
Sample Texts
Export Readable Meta-Data of Articles.
Exports Readable Text Lists
"textmeta"-Objects
Transform textmeta to an object with tidy text data
Calculating Topic Coherence
Coloring the words of a text corresponding to topic allocation
Get The IDs Of The Most Representive Texts
Top Words per Topic
A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.