preprocText function

preprocText

preprocText

Preprocess text data such as names and addresses.

preprocText(text, convert_text, tolower, soundex, usps_address, remove_whitespace, remove_punctuation, convert_text_to)

Arguments

  • text: A vector of text data to convert.
  • convert_text: Whether to convert text to the desired encoding, where the encoding is specified in the 'convert_text_to' argument. Default is TRUE
  • tolower: Whether to normalize the text to be all lowercase. Default is TRUE.
  • soundex: Whether to convert the field to the Census's soundex encoding. Default is FALSE.
  • usps_address: Whether to use USPS address standardization rules to clean address fields. Default is FALSE.
  • remove_whitespace: Whether to remove leading and trailing whitespace, and to convert multiple spaces to a single space. Default is TRUE.
  • remove_punctuation: Whether to remove punctuation from a string. Default is TRUE.
  • convert_text_to: Which encoding to use when converting text. Default is 'Latin-ASCII'. Full list of encodings in the stri_trans_list() function in the stringi package.

Returns

preprocText() returns the preprocessed vector of text.

Author(s)

Ben Fifield benfifield@gmail.com

  • Maintainer: Ted Enamorado
  • License: GPL (>= 3)
  • Last published: 2023-11-17

Useful links