pairs: an object or type cluster_pairs as created for example by cluster_pair.
fun: a function to call on each of the worker nodes. See details on the arguments of this function.
...: additional arguments are passed on to fun.
new_name: name of new object to assign the pairs to on the cluster nodes.
Returns
Will return a cluster_pairs object. When new_name is not given it will return the input pairs invisibly. Otherwise it will return a new cluster_pairs object.
Details
The function will have to accept the following arguments as its first three arguments:
pairs: the data.table with the pairs of the worker node.
x: a data.table with the portion of x present on the worker node.
y: a data.table with y.
The function should either return a data.table with the new pairs, or NULL. When a data.table is returned this values will replace the pairs when new_name is missing or create new pairs in the environment new_name. When the function returns NULL it is assumed that the function modified the pairs by reference (e.g. using pairs[, new_var := new_val]). Note that this also means that new_name is ignored.
Examples
# Generate some pairslibrary(parallel)data("linkexample1","linkexample2")cl <- makeCluster(2)pairs <- cluster_pair(cl, linkexample1, linkexample2)compare_pairs(pairs, c("lastname","firstname","address","sex"))# Create a new set of pairs containing a random sample of the original# pairs.sample <- cluster_call(pairs, new_name ="sample",function(pairs,...){ sel <- sample(nrow(pairs), round(nrow(pairs)*0.1)) pairs[sel,]})# CleanupstopCluster(cl)