emlinkRS() R function from [fastLink]

emlinkRS

Calculates Felligi-Sunter weights and posterior zeta probabilities for matching patterns observed in a larger population that are not present in a sub-sample used to estimate the EM.


emlinkRS(patterns.out, em.out, nobs.a, nobs.b)

Arguments

patterns.out: The output from tableCounts() or emlinkMARmov() (run on full dataset), containing all observed matching patterns in the full sample and the number of times that pattern is observed.
em.out: The output from emlinkMARmov(), an EM object estimated on a smaller random sample to apply to counts from a larger sample
nobs.a: Total number of observations in dataset A
nobs.b: Total number of observations in dataset B

Returns

emlinkMARmov returns a list with the following components: - zeta.j: The posterior match probabilities for each unique pattern.

p.m: The posterior probability of a pair matching.
p.u: The posterior probability of a pair not matching.
p.gamma.k.m: The posterior of the matching probability for a specific matching field.
p.gamma.k.u: The posterior of the non-matching probability for a specific matching field.
p.gamma.j.m: The posterior probability that a pair is in the matched set given a particular agreement pattern.
p.gamma.j.u: The posterior probability that a pair is in the unmatched set given a particular agreement pattern.
patterns.w: Counts of the agreement patterns observed, along with the Felligi-Sunter Weights.
iter.converge: The number of iterations it took the EM algorithm to converge.
nobs.a: The number of observations in dataset A.
nobs.b: The number of observations in dataset B.

Examples


## Not run:

## -------------
## Run on subset
## -------------
dfA.s <- dfA[sample(1:nrow(dfA), 50),]; dfB.s <- dfB[sample(1:nrow(dfB), 50),]

## Calculate gammas
g1 <- gammaCKpar(dfA.s$firstname, dfB.s$firstname)
g2 <- gammaCKpar(dfA.s$middlename, dfB.s$middlename)
g3 <- gammaCKpar(dfA.s$lastname, dfB.s$lastname)
g4 <- gammaKpar(dfA.s$birthyear, dfB.s$birthyear)

## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s))

## Run EM
em <- emlinkMAR(tc, nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s))

## ------------------
## Apply to full data
## ------------------

## Calculate gammas
g1 <- gammaCKpar(dfA$firstname, dfB$firstname)
g2 <- gammaCKpar(dfA$middlename, dfB$middlename)
g3 <- gammaCKpar(dfA$lastname, dfB$lastname)
g4 <- gammaKpar(dfA$birthyear, dfB$birthyear)

## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB))

em.full <- emlinkRS(tc, em, nrow(dfA), nrow(dfB)
## End(Not run)

Author(s)

Ted Enamorado ted.enamorado@gmail.com and Ben Fifield benfifield@gmail.com

fastLink package Read PDF manual

Maintainer: Ted Enamorado
License: GPL (>= 3)
Last published: 2023-11-17

Useful links

emlinkRS function

emlinkRS

Arguments

Returns

Examples

Author(s)