Select matching pairs with a score above or equal to a threshold
Select matching pairs with a score above or equal to a threshold
## S3 method for class 'cluster_pairs'select_threshold(pairs, variable, score, threshold, new_name =NULL,...)select_threshold(pairs, variable, score, threshold,...)## S3 method for class 'pairs'select_threshold(pairs, variable, score, threshold, inplace =FALSE,...)
Arguments
pairs: a pairs object, such as generated by pair_blocking
variable: the name of the new variable to create in pairs. This will be a logical variable with a value of TRUE for the selected pairs.
score: name of the score/weight variable of the pairs. When not given and attr(pairs, "score") is defined, that is used.
threshold: the threshold to apply. Pairs with a score above or equal to the threshold are selected.
new_name: name of new object to assign the pairs to on the cluster nodes.
...: ignored
inplace: logical indicating whether pairs should be modified in place. When pairs is large this can be more efficient.
Returns
Returns the pairs with the variable given by variable added. This is a logical variable indicating which pairs are selected a matches.
Examples
data("linkexample1","linkexample2")pairs <- pair_blocking(linkexample1, linkexample2,"postcode")pairs <- compare_pairs(pairs, c("lastname","firstname","address","sex"))model <- problink_em(~ lastname + firstname + address + sex, data = pairs)pairs <- predict(model, pairs, type ="mpost", add =TRUE, binary =TRUE)# Select pairs with a mpost > 0.5select_threshold(pairs,"selected","mpost",0.5, inplace =TRUE)# Example using cluster;# In general the syntax is exactly the same except for the first call to # to cluster_pair. Note the in general `inplace = TRUE` is implied when# working with a cluster; therefore the assignment back to pairs can be # omitted (also not a problem if it is not).library(parallel)data("linkexample1","linkexample2")cl <- makeCluster(2)pairs <- cluster_pair(cl, linkexample1, linkexample2)compare_pairs(pairs, c("lastname","firstname","address","sex"))model <- problink_em(~ lastname + firstname + address + sex, data = pairs)predict(model, pairs, type ="mpost", add =TRUE, binary =TRUE)# Select pairs with a mpost > 0.5# Unlike the regular pairs: inplace = TRUE is implied hereselect_threshold(pairs,"selected","mpost",0.5)stopCluster(cl)