select_threshold function

Select matching pairs with a score above or equal to a threshold

Select matching pairs with a score above or equal to a threshold

## S3 method for class 'cluster_pairs' select_threshold(pairs, variable, score, threshold, new_name = NULL, ...) select_threshold(pairs, variable, score, threshold, ...) ## S3 method for class 'pairs' select_threshold(pairs, variable, score, threshold, inplace = FALSE, ...)

Arguments

  • pairs: a pairs object, such as generated by pair_blocking
  • variable: the name of the new variable to create in pairs. This will be a logical variable with a value of TRUE for the selected pairs.
  • score: name of the score/weight variable of the pairs. When not given and attr(pairs, "score") is defined, that is used.
  • threshold: the threshold to apply. Pairs with a score above or equal to the threshold are selected.
  • new_name: name of new object to assign the pairs to on the cluster nodes.
  • ...: ignored
  • inplace: logical indicating whether pairs should be modified in place. When pairs is large this can be more efficient.

Returns

Returns the pairs with the variable given by variable added. This is a logical variable indicating which pairs are selected a matches.

Examples

data("linkexample1", "linkexample2") pairs <- pair_blocking(linkexample1, linkexample2, "postcode") pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex")) model <- problink_em(~ lastname + firstname + address + sex, data = pairs) pairs <- predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE) # Select pairs with a mpost > 0.5 select_threshold(pairs, "selected", "mpost", 0.5, inplace = TRUE) # Example using cluster; # In general the syntax is exactly the same except for the first call to # to cluster_pair. Note the in general `inplace = TRUE` is implied when # working with a cluster; therefore the assignment back to pairs can be # omitted (also not a problem if it is not). library(parallel) data("linkexample1", "linkexample2") cl <- makeCluster(2) pairs <- cluster_pair(cl, linkexample1, linkexample2) compare_pairs(pairs, c("lastname", "firstname", "address", "sex")) model <- problink_em(~ lastname + firstname + address + sex, data = pairs) predict(model, pairs, type = "mpost", add = TRUE, binary = TRUE) # Select pairs with a mpost > 0.5 # Unlike the regular pairs: inplace = TRUE is implied here select_threshold(pairs, "selected", "mpost", 0.5) stopCluster(cl)
  • Maintainer: Jan van der Laan
  • License: GPL-3
  • Last published: 2024-02-09