lsh_candidates function

Candidate pairs from LSH comparisons

Candidate pairs from LSH comparisons

Given a data frame of LSH buckets returned from lsh, this function returns the potential candidates.

lsh_candidates(buckets)

Arguments

  • buckets: A data frame returned from lsh.

Returns

A data frame of candidate pairs.

Examples

dir <- system.file("extdata/legal", package = "textreuse") minhash <- minhash_generator(200, seed = 234) corpus <- TextReuseCorpus(dir = dir, tokenizer = tokenize_ngrams, n = 5, minhash_func = minhash) buckets <- lsh(corpus, bands = 50) lsh_candidates(buckets)