formula: a formula object with the variables for which to calculate the m- and u-probabilities. Should be of the form ~ var1 + var2.
data: data set with pairs on which to estimate the model. Alternatively one can use the patterns argument.
patterns: table of patterns (as output by tabulate_patterns).
mprobs0, uprobs0: initial values of the m- and u-probabilities. These should be lists with numeric values. The names of the elements in the list should correspond to the names in by_x in compare_pairs.
p0: the initial estimate of the probability that a pair is a match.
tol: when the change in the m and u-probabilities is smaller than tol
the algorithm is stopped.
mprob_max: maximum values of the estimated m-probabilities. Values equal to one can lead to numerical instabilities.
uprob_min: maximum values of the estimated m-probabilities. Values equal to zero can lead to numerical instabilities.
Returns
Returns an object of type problink_em. This is a list containing the estimated mprobs, uprobs and overall linkage probability p. It also contains the table of comparison patterns.
Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. tools:::Rd_expr_doi("doi:10.2307/2286061") .
Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.