gammaCKpar
Field comparisons for string variables. Three possible agreement patterns are considered: 0 total disagreement, 1 partial agreement, 2 agreement. The distance between strings is calculated using a Jaro-Winkler distance.
gammaCKpar(matAp, matBp, n.cores, cut.a, cut.p, method, w)
matAp
: vector storing the comparison field in data set 1matBp
: vector storing the comparison field in data set 2n.cores
: Number of cores to parallelize over. Default is NULL.cut.a
: Lower bound for full match, ranging between 0 and 1. Default is 0.92cut.p
: Lower bound for partial match, ranging between 0 and 1. Default is 0.88method
: String distance method, options are: "jw" Jaro-Winkler (Default), "dl" Damerau-Levenshtein, "jaro" Jaro, and "lv" Editw
: Parameter that describes the importance of the first characters of a string (only needed if method = "jw"). Default is .10gammaCKpar
returns a list with the indices corresponding to each matching pattern, which can be fed directly into tableCounts
and matchesLink
.
## Not run: g1 <- gammaCKpar(dfA$firstname, dfB$lastname) ## End(Not run)
Ted Enamorado ted.enamorado@gmail.com, Ben Fifield benfifield@gmail.com, and Kosuke Imai
Useful links