freq_terms function

Find Frequent Terms

Find Frequent Terms

Find the most frequently occurring terms in a text vector.

freq_terms( text.var, top = 20, at.least = 1, stopwords = NULL, extend = TRUE, ... )

Arguments

  • text.var: The text variable.
  • top: Top number of terms to show.
  • at.least: An integer indicating at least how many letters a word must be to be included in the output.
  • stopwords: A character vector of words to remove from the text. qdap has a number of data sets that can be used as stop words including: Top200Words, Top100Words, Top25Words. For the tm package's traditional English stop words use tm::stopwords("english").
  • extend: logical. If TRUE the top argument is extended to any word that has the same frequency as the top word.
  • ``: Other arguments passed to all_words.

Returns

Returns a dataframe with the top occurring words.

Examples

## Not run: freq_terms(DATA$state, 5) freq_terms(DATA$state) freq_terms(DATA$state, extend = FALSE) freq_terms(DATA$state, at.least = 4) (out <- freq_terms(pres_debates2012$dialogue, stopwords = Top200Words)) plot(out) ## All words by sentence (row) library(qdapTools) x <- raj$dialogue list_df2df(setNames(lapply(x, freq_terms, top=Inf), seq_along(x)), "row") list_df2df(setNames(lapply(x, freq_terms, top=10, stopwords = Dolch), seq_along(x)), "Title") ## All words by person FUN <- function(x, n=Inf) freq_terms(paste(x, collapse=" "), top=n) list_df2df(lapply(split(x, raj$person), FUN), "person") ## Plot it out <- lapply(split(x, raj$person), FUN, n=10) pdf("Freq Terms by Person.pdf", width=13) lapply(seq_along(out), function(i) { ## dev.new() plot(out[[i]], plot=FALSE) + ggtitle(names(out)[i]) }) dev.off() ## Keep spaces freq_terms(space_fill(DATA$state, "are you"), 500, char.keep="~~") ## End(Not run)

See Also

word_list, all_words

  • Maintainer: Tyler Rinker
  • License: GPL-2
  • Last published: 2023-05-11