tf-idf
term frequency, inverse document frequency
tfidf(x,normalize=TRUE)
x
: A dgCMatrix
or matrix
of counts.normalize
: Whether to normalize term frequency by document totals.A matrix of the same type as x
, with values replaced by the tf-idf
where is or , depending on normalize
, and is the number of documents containing token .
Matt Taddy taddy@chicagobooth.edu
data(we8there) ## 20 high-variance tf-idf terms colnames(we8thereCounts)[ order(-sdev(tfidf(we8thereCounts)))[1:20]]
pls, we8there