Uncleaned text may result in errors, warnings, and incorrect results in subsequent analysis. check_text checks text for potential problems and suggests possible fixes. Potential text anomalies that are detected include: factors, missing ending punctuation, empty cells, double punctuation, non-space after comma, no alphabetic characters, non-ascii, missing value, and potentially misspelled words.
check_text(text.var, file =NULL)
Arguments
text.var: The text variable.
file: A connection, or a character string naming the file to print to. If NULL prints to the console. Note that this is assigned as an attribute and passed to print.
Returns
Returns a list with the following potential text faults reports:
non_character- Text that is non-character.
missing_ending_punctuation- Text with no endmark at the end of the string.
empty- Text that contains an empty element (i.e., "").
double_punctuation- Text that contains two qdap punctuation marks in the same string.
non_space_after_comma- Text that contains commas with no space after them.
no_alpha- Text that contains string elements with no alphabetic characters.
non_ascii- Text that contains non-ASCII characters.
missing_value- Text that contains missing values (i.e., NA).
containing_escaped- Text that contains escaped (see ?Quotes).
containing_digits- Text that contains digits.
indicating_incomplete- Text that contains endmarks that are indicative of incomplete/trailing sentences (e.g., ...).
potentially_misspelled- Text that contains potentially misspelled words.
Note
The output is a list but prints as a pretty formatted output with potential problem elements, the accompanying text, and possible suggestions to fix the text.
Examples
## Not run:x <- c("i like","i want. thet them .","I am ! that|","",NA,"they,were there","."," ","?","3;","I like goud eggs!","i 4like...","\\tgreat","She said \"yes\"")check_text(x)print(check_text(x), include.text=FALSE)y <- c("A valid sentence.","yet another!")check_text(y)## End(Not run)