Generates all combinations of records from x and y where the blocking variables are equal.
pair_blocking(x, y, on, deduplication =FALSE, add_xy =TRUE)
Arguments
x: first data.frame
y: second data.frame. Ignored when deduplication = TRUE.
on: the variables defining the blocks or strata for which all pairs of x and y will be generated.
deduplication: generate pairs from only x. Ignore y. This is usefull for deduplication of x.
add_xy: add x and y as attributes to the returned pairs. This makes calling some subsequent operations that need x and y (such as compare_pairs easier.
Returns
A data.table with two columns, .x and .y, is returned. Columns .x and .y are row numbers from data.frames .x and .y respectively.
Details
Generating (all) pairs of the records of two data sets, is usually the first step when linking the two data sets. However, this often results in a too large number of records. Therefore, blocking is usually applied.