trim: logical. If TRUE removes leading and trailing white spaces.
clean: trim logical. If TRUE extra white spaces and escaped character will be removed.
pattern: A character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Default, @rm_url uses the rm_url regex from the regular expression dictionary from the dictionary argument.
replacement: Replacement for matched pattern.
extract: logical. If TRUE the URLs are extracted into a list of vectors.
dictionary: A dictionary of canned regular expressions to search within if pattern begins with "@rm_".
...: Other arguments passed to gsub.
Returns
Returns a character string with URLs removed.
Details
The default regex pattern "(http[^ ]*)|(www\.[^ ]*)" is more liberal. More constrained versions can be accessed via pattern = "@rm_url2" & pattern = "@rm_url3" see Examples ).
Examples
x <-" I like www.talkstats.com and http://stackoverflow.com"rm_url(x)rm_url(x, replacement ='<a href="\\1" target="_blank">\\1</a>')ex_url(x)ex_url(x, pattern ="@rm_url2")ex_url(x, pattern ="@rm_url3")## Remove Twitter Short URLx <- c("download file from http://example.com","this is the link to my website http://example.com","go to http://example.com from more info.","Another url ftp://www.example.com","And https://www.example.net","twitter type: t.co/N1kq0F26tG","still another one https://t.co/N1kq0F26tG :-)")rm_twitter_url(x)ex_twitter_url(x)## Combine removing Twitter URLs and standard URLsrm_twitter_n_url <- rm_(pattern=pastex("@rm_twitter_url","@rm_url"))rm_twitter_n_url(x)rm_twitter_n_url(x, extract=TRUE)
References
The more constrained url regular expressions ("@rm_url2"