textreuse R package [Documentation]

align_local

Local alignment of natural language texts

as.matrix.textreuse_candidates

Convert candidates data frames to other formats

filenames

Filenames from paths

hash_string

Hash a string to an integer

lsh

Locality sensitive hashing for minhash

lsh_candidates

Candidate pairs from LSH comparisons

lsh_compare

Compare candidates identified by LSH

lsh_probability

Probability that a candidate pair will be detected with LSH

lsh_query

Query a LSH cache for matches to a single document

lsh_subset

List of all candidates in a corpus

minhash_generator

Generate a minhash function

pairwise_candidates

Candidate pairs from pairwise comparisons

pairwise_compare

Pairwise comparisons among documents in a corpus

reexports

Objects exported from other packages

rehash

Recompute the hashes for a document or corpus

similarity-functions

Measure similarity/dissimilarity in documents

textreuse-package

textreuse: Detect Text Reuse and Document Similarity

TextReuseCorpus

TextReuseTextDocument-accessors

Accessors for TextReuse objects

TextReuseTextDocument

tokenize

Recompute the tokens for a document or corpus

tokenizers

Split texts into tokens

wordcount

Count words

Download source package Read PDF manual

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Maintainer: Lincoln Mullen
License: MIT + file LICENSE
Last published: 2020-05-15

Useful links

textreuse0.1.5 package

Functions

Readme

Dependencies

Imports

Versions

News