Text Cleaning Tools
Filter Elements in a Vetor
Replace Kerned (Spaced) with No Space Version
Replace Money With Words
Replace First/Last Names
Strip Text
Find Tokens that Match a Regex
Ensure Space After Comma
Add Missing Endmarks
Check Text For Potential Problems
Hold the Place of Characters Prior to Subbing
Swap Two Patterns Simultaneously
Text Cleaning Tools
Filter Rows That Contain Markers
Replace a Regex with an Functional Operation on the Regex Match
Remove Elements in a Vetor
Remove Rows That Contain Markers
Test for Incomplete Sentences
Make Plural (or Verb to Singular) Versions of Words
Multiple gsub
Prints a check_text Object
Prints a sub_holder object
Prints a which_are_locs Object
Objects exported from other packages
Replace Contractions
Replace Dates With Words
Replace Email Addresses
Replace Internet Slang
Replace Emojis With Words/Identifier
Replace Emoticons With Words
Replace Grades With Words
Replace Hashes
Replace HTML Markup
Denote Incomplete End Marks With "|"
Replace Common Non-ASCII Characters
Replace Numbers With Text Representation
Replace Mixed Ordinal Numbers With Text Representation
Replace Ratings With Words
Replace Symbols With Word Equivalents
Replace Handle Tags
Replace Time Stamps With Words
Grab Begin/End of String to/from Character
Replace Tokens
Replace URLs
Remove Escaped Characters
Replace Word Elongations
Detect/Locate Potential Non-Normalized Text
Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.