Remove/replace/extract substrings from a string. A function generator used to make regex functions that operate typical of other qdapRegex
rm_XXX functions. Use rm_ for removal and ex_ for extraction.
rm_(...)ex_(...)
Arguments
``: Arguments passed to rm_default. Generally, pattern and extract are the most useful parameters to change. Arguments that can be set include:
text.var: The text variable.
trim: logical. If TRUE removes leading and trailing white spaces.
clean: logical. If TRUE extra white spaces and escaped character will be removed.
pattern: A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector.
replacement: Replacement for matched pattern.
extract: logical. If TRUE strings are extracted into a list of vectors.
dictionary: A dictionary of canned regular expressions to search within if pattern begins with "@rm_".
...: Other arguments passed to gsub.
Returns
Returns a function that operates typical of other qdapRegex
rm_XXX functions but with user defined defaults.
Examples
rm_digit <- rm_(pattern="[0-9]")rm_digit(" I 12 li34ke ice56cream78. ")rm_lead <- rm_(pattern="^\\s+", trim =FALSE, clean =FALSE)rm_lead(" I 12 li34ke ice56cream78. ")rm_all_except_letters <- rm_(pattern="[^ a-zA-Z]")rm_all_except_letters(" I 12 li34ke ice56cream78. ")extract_consec_num <- rm_(pattern="[0-9]+", extract =TRUE)extract_consec_num(" I 12 li34ke ice56cream78. ")## Using the supplemental dictionary dataset:x <-"A man lives there! The dog likes it. I want the map. I want an apple."extract_word_after_the <- rm_(extract=TRUE, pattern="@after_the")extract_word_after_a <- rm_(extract=TRUE, pattern="@after_a")extract_word_after_the(x)extract_word_after_a(x)f <- rm_(pattern="@time_12_hours")f("I will go at 12:35 pm")x <- c("test@aol.fg.com","test@hotmail.com","test@xyzrr.lk.edu","test@abc.xx.zz.vv.net")file_ext2 <- rm_(pattern="(?<=\\.)[a-z]*$", extract=TRUE)tools::file_ext(x)file_ext2(x)