reclin20.5.0 package

Record Linkage Toolkit

add_from_x

Add a variable from one of the data sets to pairs

cluster_call

Call a function on each of the worker nodes and pass it the pairs

cluster_collect

Collect pairs from cluster nodes

cluster_modify_pairs

Call a function on each of the worker nodes to modify the pairs on the...

cluster_pair

Generate all possible pairs using multiple processes

cluster_pair_blocking

Generate pairs using simple blocking using multiple processes

cluster_pair_minsim

Generate pairs with a minimal similarity using multiple processes

comparators

Comparison functions

compare_pairs

Compare pairs on a set of variables common in both data sets

compare_vars

Compare pairs on given variables

deduplicate_equivalence

Deduplication using equivalence groups

get_inspect_pairs

Get a subset of pairs to inspect

greedy

Greedy one-to-one matching of pairs

link

Use the selected pairs to generate a linked data set

linkexample

Tiny example dataset for probabilistic linkage

match_n_to_m

Force n to m matching on a set of pairs

merge_pairs

Merge two sets of pairs into one

pair

Generate all possible pairs

pair_blocking

Generate pairs using simple blocking

pair_minsim

Generate pairs with a minimal similarity

predict.problink_em

Calculate weights and probabilities for pairs

problink_em

Calculate EM-estimates of m- and u-probabilities

score_simple

Score pairs based on a number of comparison vectors

select_n_to_m

Select matching pairs enforcing one-to-one linkage

select_threshold

Select matching pairs with a score above or equal to a threshold

select_unique

Deselect pairs that are linked to multiple records

summary.problink_em

Summarise the results from problink_em

tabulate_patterns

Create a table of comparison patterns

town_names

Spelling variations of a set of town names

Functions to assist in performing probabilistic record linkage and deduplication: generating pairs, comparing records, em-algorithm for estimating m- and u-probabilities (I. Fellegi & A. Sunter (1969) <doi:10.1080/01621459.1969.10501049>, T.N. Herzog, F.J. Scheuren, & W.E. Winkler (2007), "Data Quality and Record Linkage Techniques", ISBN:978-0-387-69502-0), forcing one-to-one matching. Can also be used for pre- and post-processing for machine learning methods for record linkage. Focus is on memory, CPU performance and flexibility.

  • Maintainer: Jan van der Laan
  • License: GPL-3
  • Last published: 2024-02-09