cluster_modify_pairs function

Call a function on each of the worker nodes to modify the pairs on the node

Call a function on each of the worker nodes to modify the pairs on the node

cluster_modify_pairs(pairs, fun, ..., new_name = NULL)

Arguments

  • pairs: an object or type cluster_pairs as created for example by cluster_pair.
  • fun: a function to call on each of the worker nodes. See details on the arguments of this function.
  • ...: additional arguments are passed on to fun.
  • new_name: name of new object to assign the pairs to on the cluster nodes.

Returns

Will return a cluster_pairs object. When new_name is not given it will return the input pairs invisibly. Otherwise it will return a new cluster_pairs object.

Details

The function will have to accept the following arguments as its first three arguments:

  • pairs: the data.table with the pairs of the worker node.
  • x: a data.table with the portion of x present on the worker node.
  • y: a data.table with y.

The function should either return a data.table with the new pairs, or NULL. When a data.table is returned this values will replace the pairs when new_name is missing or create new pairs in the environment new_name. When the function returns NULL it is assumed that the function modified the pairs by reference (e.g. using pairs[, new_var := new_val]). Note that this also means that new_name is ignored.

Examples

# Generate some pairs library(parallel) data("linkexample1", "linkexample2") cl <- makeCluster(2) pairs <- cluster_pair(cl, linkexample1, linkexample2) compare_pairs(pairs, c("lastname", "firstname", "address", "sex")) # Create a new set of pairs containing a random sample of the original # pairs. sample <- cluster_call(pairs, new_name = "sample", function(pairs, ...) { sel <- sample(nrow(pairs), round(nrow(pairs)*0.1)) pairs[sel, ] }) # Cleanup stopCluster(cl)
  • Maintainer: Jan van der Laan
  • License: GPL-3
  • Last published: 2024-02-09