rm_stopwords function

Remove Stop Words

Remove Stop Words

Removal of stop words in a variety of contexts .

%sw% - Binary operator version of rm_stopwords that defaults to separate = FALSE..

rm_stopwords( text.var, stopwords = qdapDictionaries::Top25Words, unlist = FALSE, separate = TRUE, strip = FALSE, unique = FALSE, char.keep = NULL, names = FALSE, ignore.case = TRUE, apostrophe.remove = FALSE, ... ) rm_stop( text.var, stopwords = qdapDictionaries::Top25Words, unlist = FALSE, separate = TRUE, strip = FALSE, unique = FALSE, char.keep = NULL, names = FALSE, ignore.case = TRUE, apostrophe.remove = FALSE, ... ) text.var %sw% stopwords

Arguments

  • text.var: A character string of text or a vector of character strings.
  • stopwords: A character vector of words to remove from the text. qdap has a number of data sets that can be used as stop words including: Top200Words, Top100Words, Top25Words. For the tm package's traditional English stop words use tm::stopwords("english").
  • unlist: logical. If TRUE unlists into one vector. General use intended for when separate is FALSE.
  • separate: logical. If TRUE separates sentences into words. If FALSE retains sentences.
  • strip: logical. IF TRUE strips the text of all punctuation except apostrophes.
  • unique: logical. If TRUE keeps only unique words (if unlist is TRUE) or sentences (if unlist is FALSE). General use intended for when unlist is TRUE.
  • char.keep: If strip is TRUE this argument provides a means of retaining supplied character(s).
  • names: logical. If TRUE will name the elements of the vector or list with the original text.var.
  • ignore.case: logical. If TRUE stopwords will be removed regardless of case. Additionally, case will be stripped from the text. If FALSE stop word removal is contingent upon case. Additionally, case is not stripped.
  • apostrophe.remove: logical. If TRUE removes apostrophe's from the output.
  • ``: further arguments passed to strip function.

Returns

Returns a vector of sentences, vector of words, or (default) a list of vectors of words with stop words removed. Output depends on supplied arguments.

Examples

## Not run: rm_stopwords(DATA$state) rm_stopwords(DATA$state, tm::stopwords("english")) rm_stopwords(DATA$state, Top200Words) rm_stopwords(DATA$state, Top200Words, strip = TRUE) rm_stopwords(DATA$state, Top200Words, separate = FALSE) rm_stopwords(DATA$state, Top200Words, separate = FALSE, ignore.case = FALSE) rm_stopwords(DATA$state, Top200Words, unlist = TRUE) rm_stopwords(DATA$state, Top200Words, unlist = TRUE, strip=TRUE) rm_stop(DATA$state, Top200Words, unlist = TRUE, unique = TRUE) c("I like it alot", "I like it too") %sw% qdapDictionaries::Top25Words ## End(Not run)

See Also

strip, bag_o_words, stopwords

  • Maintainer: Tyler Rinker
  • License: GPL-2
  • Last published: 2023-05-11