Superlatively Fast Fuzzy Joins
Fit a Probabilistic Matching Model using Naive Bayes + E.M.
Plot S-Curve for a LSH with given hyperparameters
Find Probability of Match Based on Similarity
Fuzzy joins for Euclidean distance using Locality Sensitive Hashing
Perform a Fuzzy-Join With an Arbitrary Distance Metric
Calculate Hamming distance of two character vectors
Find Probability of Match Based on Similarity
Fuzzy joins for Hamming distance using Locality Sensitive Hashing
Plot S-Curve for a LSH with given hyperparameters
Help Choose the Appropriate LSH Hyperparameters
Find Probability of Match Based on Similarity
Calculate Jaccard Similarity of two character vectors
Fuzzy String Grouping Using Minhashing
Fuzzy joins for Jaccard distance using MinHash
Empowers users to fuzzily-merge data frames with millions or tens of millions of rows in minutes with low memory usage. The package uses the locality sensitive hashing algorithms developed by Datar, Immorlica, Indyk and Mirrokni (2004) <doi:10.1145/997817.997857>, and Broder (1998) <doi:10.1109/SEQUEN.1997.666900> to avoid having to compare every pair of records in each dataset, resulting in fuzzy-merges that finish in linear time.
Useful links