preprocess function

Preprocess text corpus

Preprocess text corpus

A simple text preprocessing utility.

preprocess(input, erase = "[^.?!:;'\\w\\s]", lower_case = TRUE)

Arguments

  • input: a character vector.
  • erase: a length one character vector. Regular expression matching parts of text to be erased from input. The default removes anything not alphanumeric, white space, apostrophes or punctuation characters (i.e. ".?!:;").
  • lower_case: a length one logical vector. If TRUE, puts everything to lower case.

Returns

a character vector containing the processed output.

Examples

preprocess("Hi @ there! I'm using `sbo`.")

Author(s)

Valerio Gherardi