matchesLink function

matchesLink

matchesLink

matchesLink produces two dataframes that store all the pairs that share a pattern that conforms to the an interval of the Fellegi-Sunter weights

matchesLink(gammalist, nobs.a, nobs.b, em, thresh, n.cores = NULL)

Arguments

  • gammalist: A list of objects produced by either gammaKpar or gammaCKpar.
  • nobs.a: number of observations in dataset 1
  • nobs.b: number of observations in dataset 2
  • em: parameters obtained from the Expectation-Maximization algorithm under the MAR assumption. These estimates are produced by emlinkMARmov
  • thresh: is the interval of posterior zeta values for the agreements that we want to examine closer. Ranges between 0 and 1. Can be a vector of length 1 (from specified value to 1) or 2 (from first specified value to second specified value).
  • n.cores: Number of cores to parallelize over. Default is NULL.

Returns

matchesLink returns an nmatches X 2 matrix with the indices of the matches rows in dataset A and dataset B.

Examples

## Not run: ## Calculate gammas g1 <- gammaCKpar(dfA$firstname, dfB$firstname) g2 <- gammaCKpar(dfA$middlename, dfB$middlename) g3 <- gammaCKpar(dfA$lastname, dfB$lastname) g4 <- gammaKpar(dfA$birthyear, dfB$birthyear) ## Run tableCounts tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB)) ## Run EM em <- emlinkMAR(tc) ## Get matches ml <- matchesLink(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB), em = em, thresh = .95) ## End(Not run)

Author(s)

Ted Enamorado ted.enamorado@gmail.com, Ben Fifield benfifield@gmail.com, and Kosuke Imai

  • Maintainer: Ted Enamorado
  • License: GPL (>= 3)
  • Last published: 2023-11-17

Useful links