emlinkRS function

emlinkRS

emlinkRS

Calculates Felligi-Sunter weights and posterior zeta probabilities for matching patterns observed in a larger population that are not present in a sub-sample used to estimate the EM.

emlinkRS(patterns.out, em.out, nobs.a, nobs.b)

Arguments

  • patterns.out: The output from tableCounts() or emlinkMARmov() (run on full dataset), containing all observed matching patterns in the full sample and the number of times that pattern is observed.
  • em.out: The output from emlinkMARmov(), an EM object estimated on a smaller random sample to apply to counts from a larger sample
  • nobs.a: Total number of observations in dataset A
  • nobs.b: Total number of observations in dataset B

Returns

emlinkMARmov returns a list with the following components: - zeta.j: The posterior match probabilities for each unique pattern.

  • p.m: The posterior probability of a pair matching.

  • p.u: The posterior probability of a pair not matching.

  • p.gamma.k.m: The posterior of the matching probability for a specific matching field.

  • p.gamma.k.u: The posterior of the non-matching probability for a specific matching field.

  • p.gamma.j.m: The posterior probability that a pair is in the matched set given a particular agreement pattern.

  • p.gamma.j.u: The posterior probability that a pair is in the unmatched set given a particular agreement pattern.

  • patterns.w: Counts of the agreement patterns observed, along with the Felligi-Sunter Weights.

  • iter.converge: The number of iterations it took the EM algorithm to converge.

  • nobs.a: The number of observations in dataset A.

  • nobs.b: The number of observations in dataset B.

Examples

## Not run: ## ------------- ## Run on subset ## ------------- dfA.s <- dfA[sample(1:nrow(dfA), 50),]; dfB.s <- dfB[sample(1:nrow(dfB), 50),] ## Calculate gammas g1 <- gammaCKpar(dfA.s$firstname, dfB.s$firstname) g2 <- gammaCKpar(dfA.s$middlename, dfB.s$middlename) g3 <- gammaCKpar(dfA.s$lastname, dfB.s$lastname) g4 <- gammaKpar(dfA.s$birthyear, dfB.s$birthyear) ## Run tableCounts tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s)) ## Run EM em <- emlinkMAR(tc, nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s)) ## ------------------ ## Apply to full data ## ------------------ ## Calculate gammas g1 <- gammaCKpar(dfA$firstname, dfB$firstname) g2 <- gammaCKpar(dfA$middlename, dfB$middlename) g3 <- gammaCKpar(dfA$lastname, dfB$lastname) g4 <- gammaKpar(dfA$birthyear, dfB$birthyear) ## Run tableCounts tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB)) em.full <- emlinkRS(tc, em, nrow(dfA), nrow(dfB) ## End(Not run)

Author(s)

Ted Enamorado ted.enamorado@gmail.com and Ben Fifield benfifield@gmail.com

  • Maintainer: Ted Enamorado
  • License: GPL (>= 3)
  • Last published: 2023-11-17

Useful links