problink_em function

Calculate EM-estimates of m- and u-probabilities

Calculate EM-estimates of m- and u-probabilities

problink_em( formula, data, patterns, mprobs0 = list(0.95), uprobs0 = list(0.02), p0 = 0.05, tol = 1e-05, mprob_max = 0.999, uprob_min = 1e-04 )

Arguments

  • formula: a formula object with the variables for which to calculate the m- and u-probabilities. Should be of the form ~ var1 + var2.

  • data: data set with pairs on which to estimate the model. Alternatively one can use the patterns argument.

  • patterns: table of patterns (as output by tabulate_patterns).

  • mprobs0, uprobs0: initial values of the m- and u-probabilities. These should be lists with numeric values. The names of the elements in the list should correspond to the names in by_x in compare_pairs.

  • p0: the initial estimate of the probability that a pair is a match.

  • tol: when the change in the m and u-probabilities is smaller than tol

    the algorithm is stopped.

  • mprob_max: maximum values of the estimated m-probabilities. Values equal to one can lead to numerical instabilities.

  • uprob_min: maximum values of the estimated m-probabilities. Values equal to zero can lead to numerical instabilities.

Returns

Returns an object of type problink_em. This is a list containing the estimated mprobs, uprobs and overall linkage probability p. It also contains the table of comparison patterns.

Examples

data("linkexample1", "linkexample2") pairs <- pair_blocking(linkexample1, linkexample2, "postcode") pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex")) model <- problink_em(~ lastname + firstname + address + sex, data = pairs) summary(model)

References

Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. tools:::Rd_expr_doi("doi:10.2307/2286061") .

Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.

  • Maintainer: Jan van der Laan
  • License: GPL-3
  • Last published: 2024-02-09