emlinklog function

emlinklog

emlinklog

Expectation-Maximization algorithm for Record Linkage allowing for dependencies across linkage fields

emlinklog(patterns, nobs.a, nobs.b, p.m, p.gamma.j.m, p.gamma.j.u, iter.max, tol, varnames)

Arguments

  • patterns: table that holds the counts for each unique agreement pattern. This object is produced by the function: tableCounts.
  • nobs.a: Number of observations in dataset A
  • nobs.b: Number of observations in dataset B
  • p.m: probability of finding a match. Default is 0.1
  • p.gamma.j.m: probability that conditional of being in the matched set we observed a specific agreement pattern.
  • p.gamma.j.u: probability that conditional of being in the non-matched set we observed a specific agreement pattern.
  • iter.max: Max number of iterations. Default is 5000
  • tol: Convergence tolerance. Default is 1e-05
  • varnames: The vector of variable names used for matching. Automatically provided if using fastLink() wrapper. Used for clean visualization of EM results in summary functions.

Returns

emlinklog returns a list with the following components: - zeta.j: The posterior match probabilities for each unique pattern.

  • p.m: The probability of finding a match.

  • p.u: The probability of finding a non-match.

  • p.gamma.j.m: The probability of observing a particular agreement pattern conditional on being in the set of matches.

  • p.gamma.j.u: The probability of observing a particular agreement pattern conditional on being in the set of non-matches.

  • patterns.w: Counts of the agreement patterns observed, along with the Felligi-Sunter Weights.

  • iter.converge: The number of iterations it took the EM algorithm to converge.

  • nobs.a: The number of observations in dataset A.

  • nobs.b: The number of observations in dataset B.

Examples

## Not run: ## Calculate gammas g1 <- gammaCKpar(dfA$firstname, dfB$firstname) g2 <- gammaCKpar(dfA$middlename, dfB$middlename) g3 <- gammaCKpar(dfA$lastname, dfB$lastname) g4 <- gammaKpar(dfA$birthyear, dfB$birthyear) ## Run tableCounts tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB)) ## Run EM em.log <- emlinklog(tc, nobs.a = nrow(dfA), nobs.b = nrow(dfB)) ## End(Not run)

Author(s)

Ted Enamorado ted.enamorado@gmail.com and Benjamin Fifield

  • Maintainer: Ted Enamorado
  • License: GPL (>= 3)
  • Last published: 2023-11-17

Useful links