gammaCKpar function

gammaCKpar

gammaCKpar

Field comparisons for string variables. Three possible agreement patterns are considered: 0 total disagreement, 1 partial agreement, 2 agreement. The distance between strings is calculated using a Jaro-Winkler distance.

gammaCKpar(matAp, matBp, n.cores, cut.a, cut.p, method, w)

Arguments

  • matAp: vector storing the comparison field in data set 1
  • matBp: vector storing the comparison field in data set 2
  • n.cores: Number of cores to parallelize over. Default is NULL.
  • cut.a: Lower bound for full match, ranging between 0 and 1. Default is 0.92
  • cut.p: Lower bound for partial match, ranging between 0 and 1. Default is 0.88
  • method: String distance method, options are: "jw" Jaro-Winkler (Default), "dl" Damerau-Levenshtein, "jaro" Jaro, and "lv" Edit
  • w: Parameter that describes the importance of the first characters of a string (only needed if method = "jw"). Default is .10

Returns

gammaCKpar returns a list with the indices corresponding to each matching pattern, which can be fed directly into tableCounts and matchesLink.

Examples

## Not run: g1 <- gammaCKpar(dfA$firstname, dfB$lastname) ## End(Not run)

Author(s)

Ted Enamorado ted.enamorado@gmail.com, Ben Fifield benfifield@gmail.com, and Kosuke Imai

  • Maintainer: Ted Enamorado
  • License: GPL (>= 3)
  • Last published: 2023-11-17

Useful links