ngrams function

Generate ngrams

Generate ngrams

Transcript apply ngrams.

ngrams(text.var, grouping.var = NULL, n = 2, ...)

Arguments

  • text.var: The text variable
  • grouping.var: The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
  • n: The max number of grams calculate
  • ...: Further arguments passed to strip function.

Returns

Returns a list of: - raw: A list of pasted single vectors of the ngrams per row.

  • group: A list of pasted vectors of ngrams grouped by grouping.var.

  • unlist1: A list of a single vector of pasted ngrams per grouping.var in the order used.

  • unlist2: A list of a single vector of pasted ngrams per grouping.var in alphabetical order.

  • group_n: A list of a list of vectors of ngrams per grouping.var & n (not pasted).

  • all: A single vector of pasted ngrams sorted alphabetically.

  • all_n: A list of lists a single vectors of ngrams sorted alphabetically (not pasted).

Examples

## Not run: ngrams(DATA$state, DATA$person, 2) ngrams(DATA$state, DATA$person, 3) ngrams(DATA$state, , 3) with(mraja1, ngrams(dialogue, list(sex, fam.aff), 3)) ## Alternative ngram analysis: n_gram <- function(x, n = 2, sep = " "){ m <- qdap::bag_o_words(x) if (length(m) < n) return(character(0)) starts <- 1:(length(m) - (n - 1)) ends <- n:length(m) Map(function(x, y){ paste(m[x:y], collapse=sep) }, starts, ends ) } dat <- sentSplit(DATA, "state") dat[["grams"]] <- sapply(dat[["state"]], function(x) { unbag(n_gram(x, sep = "~~")) }) m <- with(dat, as.tdm(grams, person)) rownames(m) <- gsub("~~", " ", rownames(m)) as.matrix(m) rowSums(as.matrix(m)) dat2 <- sentSplit(raj, "dialogue") dat2[["grams"]] <- sapply(dat2[["dialogue"]], function(x) { unbag(n_gram(x, sep = "~~")) }) m2 <- with(dat2, as.tdm(grams, person)) rownames(m2) <- gsub("~~", " ", rownames(m2)) qheat(t(as.matrix(tm:::weightTfIdf(tm::removeSparseTerms(m2, .7)))), high="red") sort(rowSums(as.matrix(m2))) ## End(Not run)
  • Maintainer: Tyler Rinker
  • License: GPL-2
  • Last published: 2023-05-11