Compare pairs on a set of variables common in both data sets
Compare pairs on a set of variables common in both data sets
## S3 method for class 'cluster_pairs'compare_pairs( pairs, on, comparators = list(default_comparator), default_comparator = cmp_identical(), new_name =NULL,...)compare_pairs( pairs, on, comparators = list(default_comparator), default_comparator = cmp_identical(),...)## S3 method for class 'pairs'compare_pairs( pairs, on, comparators = list(default_comparator), default_comparator = cmp_identical(), x = attr(pairs,"x"), y = attr(pairs,"y"), inplace =FALSE,...)
Arguments
pairs: data.table with pairs. Should contain the columns .x and .y.
on: character vector of variables that should be compared.
comparators: named list of functions with which the variables are compared. This function should accept two vectors. Function should either return a vector or a data.table with multiple columns.
default_comparator: variables for which no comparison function is defined using comparators is compares with the function default_comparator.
new_name: name of new object to assign the pairs to on the cluster nodes.
...: Ignored for now
x: data.table with one half of the pairs.
y: data.table with the other half of the pairs.
inplace: logical indicating whether pairs should be modified in place. When pairs is large this can be more efficient.
Returns
Returns the data.tablepairs with one or more columns added in case of compare_pairs.pairs.
In case of compare_pairs.cluster_pairs, compare_pair.pairs is called on each cluster node and the resulting pairs are assigned to new_name in the environment reclin_env. When new_name is not given (or equal to NULL) the original pairs on the nodes are overwritten.
Details
It is assumed the variables in on are present in both x and y. Variables with the same names are added to pairs. When the comparator returns a data.table multiple columns are added to pairs. The names of these columns are variable pasted together with the names of the data.table returned by comparator (separated by "_").