Given a TextReuseCorpus containing documents of class TextReuseTextDocument, this function applies a comparison function to every pairing of documents, and returns a matrix with the comparison scores.
directional: Some comparison functions are commutative, so that f(a, b) == f(b, a) (e.g., jaccard_similarity). Other functions are directional, so that f(a, b) measures a's borrowing from b, which may not be the same as f(b, a) (e.g., ratio_of_matches). If directional is FALSE, then only the minimum number of comparisons will be made, i.e., the upper triangle of the matrix. If directional is TRUE, then both directional comparisons will be measured. In no case, however, will documents be compared to themselves, i.e., the diagonal of the matrix.
progress: Display a progress bar while comparing documents.
Returns
A square matrix with dimensions equal to the length of the corpus, and row and column names set by the names of the documents in the corpus. A value of NA in the matrix indicates that a comparison was not made. In cases of directional comparisons, then the comparison reported is f(row, column).
Examples
dir <- system.file("extdata/legal", package ="textreuse")corpus <- TextReuseCorpus(dir = dir)names(corpus)<- filenames(names(corpus))# A non-directional comparisonpairwise_compare(corpus, jaccard_similarity)# A directional comparisonpairwise_compare(corpus, ratio_of_matches, directional =TRUE)
See Also
See these document comparison functions, jaccard_similarity, ratio_of_matches.