tfidf function

tf-idf

tf-idf

term frequency, inverse document frequency

tfidf(x,normalize=TRUE)

Arguments

  • x: A dgCMatrix or matrix of counts.
  • normalize: Whether to normalize term frequency by document totals.

Returns

A matrix of the same type as x, with values replaced by the tf-idf

fijlog[n/(dj+1)], f_{ij} * \log[n/(d_j+1)],

where fijf_{ij} is xij/mix_{ij}/m_i or xijx_{ij}, depending on normalize, and djd_j is the number of documents containing token jj.

Author(s)

Matt Taddy taddy@chicagobooth.edu

Examples

data(we8there) ## 20 high-variance tf-idf terms colnames(we8thereCounts)[ order(-sdev(tfidf(we8thereCounts)))[1:20]]

See Also

pls, we8there