desc.stat: Logical, whether an updated descriptive statistical analysis should be conducted.
corp.rm.class: A character vector with word classes which should be ignored for frequency analysis. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE) to be used.
corp.rm.tag: A character vector with POS tags which should be ignored for frequency analysis.
Returns
An updated object of class kRp.text with the added feature freq, which is a list with information on the word frequencies of the analyzed text. Use corpusFreq to get that slot.
Details
It adds new columns with frequency information to the tokens data frame of the input data, describing how often the particular token is used in the additionally provided corpus frequency object.
To get the results, you can use taggedText to get the tokens slot, describe to get the raw descriptive statistics (only updated if desc.stat=TRUE), and corpusFreq to get the data from the added freq feature.
If corp.freq provides appropriate idf values for the types in txt.file, the term frequency--inverse document frequency statistic (tf-idf) will also be computed. Missing idf values will result in NA.
Examples
# code is only run when the english language package can be loadedif(require("koRpus.lang.en", quietly =TRUE)){ sample_file <- file.path( path.package("koRpus"),"examples","corpus","Reality_Winner.txt")# call freq.analysis() on a tokenized text tokenized.obj <- tokenize( txt=sample_file, lang="en")# the token slot before frequency analysis head(taggedText(tokenized.obj))# instead of data from a larger corpus, we'll# use the token frequencies of the text itself tokenized.obj <- freq.analysis( tokenized.obj, corp.freq=read.corp.custom(tokenized.obj))# compare the columns after the anylsis head(taggedText(tokenized.obj))# the object now has further statistics in a# new feature slot called freq hasFeature(tokenized.obj) corpusFreq(tokenized.obj)}else{}