filterByClass-methods function

Remove word classes

Remove word classes

This method strips off defined word classes of tagged text objects. methods

filterByClass(txt, ...) ## S4 method for signature 'kRp.text' filterByClass( txt, corp.rm.class = "nonpunct", corp.rm.tag = c(), as.vector = FALSE, update.desc = TRUE )

Arguments

  • txt: An object of class kRp.text.
  • ...: Additional options, currently unused.
  • corp.rm.class: A character vector with word classes which should be removed. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE) to be used. Another valid value is "stopword" to remove all detected stopwords.
  • corp.rm.tag: A character vector with valid POS tags which should be removed.
  • as.vector: Logical. If TRUE, results will be returned as a character vector containing only the text parts which survived the filtering.
  • update.desc: Logical. If TRUE, the desc slot of the tagged object will be fully recalculated using the filtered text. If FALSE, the desc slot will be copied from the original object. Finally, if NULL, the desc slot remains empty.

Returns

An object of the input class. If as.vector=TRUE, returns only a character vector.

Examples

# code is only run when the english language package can be loaded if(require("koRpus.lang.en", quietly = TRUE)){ sample_file <- file.path( path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt" ) tokenized.obj <- tokenize( txt=sample_file, lang="en" ) filterByClass(tokenized.obj) } else {}

See Also

kRp.POS.tags

  • Maintainer: Meik Michalke
  • License: GPL (>= 3)
  • Last published: 2021-05-17