trim: logical. If TRUE removes leading and trailing white spaces.
clean: trim logical. If TRUE extra white spaces and escaped character will be removed.
pattern: A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Default, @rm_non_ascii uses the rm_non_ascii regex from the regular expression dictionary from the dictionary argument. If extract = FALSE
gsub is not used as with other rm_XXX functions, rather iconv with the sub argument set is used to conduct the subbing.
replacement: Replacement for matched pattern.
extract: logical. If TRUE the all non-ASCII strings are extracted into a list of vectors.
dictionary: A dictionary of canned regular expressions to search within if pattern begins with "@rm_".
ascii.out: logical. If TRUE output is given in non-ASCII format, otherwise "byte" is used.
...: ignored.
Returns
Returns a character string with "all non-ascii" removed.
Note
MacOS 14, Sonoma (and likely all versions afterward), has a different implementation of iconv which may not result in expected results.
Warning
iconv is used within rm_non_ascii. iconv's behavior across operating systems may not be consistent.
Examples
x <- c("Hello World","Ekstr\xf8m","J\xf6reskog","bi\xdfchen Z\xfcrcher")Encoding(x)<-"latin1"x
rm_non_ascii(x)rm_non_ascii(x, replacement="<<FLAG>>")ex_non_ascii(x)ex_non_ascii(x, ascii.out=FALSE)## simple regex to remove non-asciirm_default(x, pattern="[^ -~]")ex_default(x, pattern="[^ -~]")