Approximate String Matching, Fuzzy Text Search, and String Distance Functions
Stringdist-based fuzzy text search
Approximate string matching
Phonetic algorithms
Detect the presence of non-printable or non-ascii characters
Get a table of qgram counts from one or more character vectors.
Approximate matching for integer sequences.
Compute distance metrics between integer sequences
Get a table of qgram counts for integer sequences
Compute similarity scores between sequences of integers
Calling stringdist from C
or C++
String metrics in stringdist
String metrics in stringdist
A package for string distance calculation and approximate string match...
Multithreading and parallelization in stringdist
Compute distance metrics between strings
Compute similarity scores between strings
Implements an approximate string matching version of R's native 'match' function. Also offers fuzzy text search based on various string distance measures. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using 'openMP'. An API for C or C++ is exposed as well. Reference: MPJ van der Loo (2014) <doi:10.32614/RJ-2014-011>.
Useful links