koRpus R package [Documentation]

ARI

Readability: Automated Readability Index (ARI)

available.koRpus.lang

List available language packages

bormuth

Readability: Bormuth's Mean Cloze and Grade Placement

C.ld

Lexical diversity: Herdan's C

clozeDelete-methods

Transform text into cloze test format

coleman.liau

Readability: Coleman-Liau Index

coleman

Readability: Coleman's Formulas

correct-methods

Methods to correct koRpus objects

cTest-methods

Transform text into C-Test-like format

CTTR

Lexical diversity: Carroll's corrected TTR (CTTR)

dale.chall

Readability: Dale-Chall Readability Formula

danielson.bryan

Readability: Danielson-Bryan

dickes.steiwer

Readability: Dickes-Steiwer Handformel

docTermMatrix

Generate a document-term matrix

DRP

Readability: Degrees of Reading Power (DRP)

ELF

Readability: Fang's Easy Listening Formula (ELF)

farr.jenkins.paterson

Readability: Farr-Jenkins-Paterson Index

filterByClass-methods

Remove word classes

flesch.kincaid

Readability: Flesch-Kincaid Grade Level

flesch

Readability: Flesch Readability Ease

FOG

Readability: Gunning FOG Index

FORCAST

Readability: FORCAST Index

freq.analysis-methods

Analyze word frequencies

fucks

Readability: Fucks' Stilcharakteristik

get.kRp.env

Get koRpus session settings

guess.lang

Guess language a text is written in

gutierrez

Readability: Gutiérrez Fórmula de comprensibilidad

harris.jacobson

Readability: Harris-Jacobson indices

HDD

Lexical diversity: HD-D (vocd-d)

hyphen-methods

Automatic hyphenation

install.koRpus.lang

Install language support packages

jumbleWords-methods

Produce jumbled words

K.ld

Lexical diversity: Yule's K

koRpus-deprecated

Deprecated object classes

koRpus-package

tools:::Rd_package_title("koRpus")

kRp.cluster

Work in (early) progress. Probably don't even look at it. Consider it ...

kRp.corp.freq-class

S4 Class kRp.corp.freq

kRp.lang-class

S4 Class kRp.lang

kRp.POS.tags

Get elaborated word tag definitions

kRp.readability-class

S4 Class kRp.readability

kRp.text_get-methods

Getter/setter methods for koRpus objects

kRp.text-class

S4 Class kRp.text

kRp.TTR-class

S4 Class kRp.TTR

lex.div-methods

Analyze lexical diversity

lex.div.num

Calculate lexical diversity

linsear.write

Readability: Linsear Write Index

LIX

Readability: Bj"ornsson's L"asbarhetsindex (LIX)

maas

Lexical diversity: Maas' indices

MATTR

Lexical diversity: Moving-Average Type-Token Ratio (MATTR)

MSTTR

Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)

MTLD

Lexical diversity: Measure of Textual Lexical Diversity (MTLD)

nWS

Readability: Neue Wiener Sachtextformeln

pasteText-methods

Paste koRpus objects

plot-methods

Plot method for objects of class kRp.text

query-methods

A method to get information out of koRpus objects

R.ld

Lexical diversity: Guiraud's R

read.BAWL

Import BAWL-R data

read.corp.celex

Import Celex data

read.corp.custom-methods

Import custom corpus data

read.corp.LCC

Import LCC data

readability-methods

Measure readability

readability.num

Calculate readability

readTagged-methods

Import already tagged texts

RIX

Readability: Anderson's Readability Index (RIX)

S.ld

Lexical diversity: Summer's S

segment.optimizer

A function to optimize MSTTR segment sizes

set.kRp.env

A function to set information on your koRpus environment

set.lang.support

Add support for new languages

show-methods

Show methods for koRpus objects

SMOG

Readability: Simple Measure of Gobbledygook (SMOG)

spache

Readability: Spache Formula

split_by_doc_id

Turn a multi-document kRp.text object into a list of kRp.text objects

strain

Readability: Strain Index

summary-methods

Summary methods for koRpus objects

textFeatures

Extract text features for authorship analysis

textTransform-methods

Letter case transformation

tokenize-methods

A simple tokenizer

traenkle.bailer

Readability: Traenkle-Bailer Formeln

treetag-methods

A method to call TreeTagger

TRI

Readability: Kuntzsch's Text-Redundanz-Index

TTR

Lexical diversity: Type-Token Ratio

tuldava

Readability: Tuldava's Text Difficulty Formula

types.tokens-methods

Get types and tokens of a given text

U.ld

Lexical diversity: Uber Index (U)

wheeler.smith

Readability: Wheeler-Smith Score

Download source package Read PDF manual

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Note: For full functionality a local installation of TreeTagger is recommended. It is also recommended to not load this package directly, but by loading one of the available language support packages from the 'l10n' repository <https://undocumeantit.github.io/repos/l10n/>. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from <https://rkward.kde.org> (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, request features, or discuss the development of the package, please subscribe to the koRpus-dev mailing list (<https://korpusml.reaktanz.de>).

Maintainer: Meik Michalke
License: GPL (>= 3)
Last published: 2026-02-03

Useful links

koRpus0.13-9 package

Functions

Readme

Dependencies

Imports

Versions

ARI