netsampler function

Network Sampling Routine

Network Sampling Routine

netsampler(network_in, key_nodes_sampler = c("random", "lognormal", "Fisher log series", "exponential", "degree", "module"), neighbors_sampler = c("random", "exponential"), n_key_nodes = 10, n_neighbors = 0.5, hidden_modules = NULL, module_sizes = NULL, cluster_fn = igraph::cluster_edge_betweenness)

Arguments

  • network_in: input network (as igraph object)
  • key_nodes_sampler: sampling criteria for key nodes. See details.
  • neighbors_sampler: sampling criteria for neighbors. see details.
  • n_key_nodes: number of key nodes to sample.
  • n_neighbors: number of first neighbors or fraction of first neighbors. See details.
  • hidden_modules: list of the modules to exclude (max 10 modules; only the first numb_hidden are used)
  • module_sizes: integer vector giving the size of each module. see details.
  • cluster_fn: a clustering function, from igraph::cluster_*. Default is igraph::cluster_edge_betweeness. Only used to compute module sizes if not provided.

Returns

the original input network (as an igraph network object), with the attribute label added to the edges and vertices indicating if that edge or vertex was sampled or unsampled.

Details

Algorithm first samples n_key_nodes according the the requested key_nodes_sampler

criterion. For each key node, the requested number or fraction of neighbors is then sampled according to the neighbors_sampler criterion. Optionally, a list of modules can be designated as "hidden" and will be excluded from sampling.

if n_neighbors is greater than 1, assumes this is the number to sample.Ifn_neighborsis between 0 and 1, assumes this is the fration of neighbors to sample.(To sample 1 neighbor, use an explicit integer,1L(or as.integer(1)`) to sample 100

Provide module_sizes list to improve performance. If not provided, this will will be calculated based on igraph::cluster_edge_betweeness. Be sure to provide a module_sizes vector whenever calling netsampler repeatedly on the same network to avoid unnecessary performance hit from recalculating modules every time. See examples.

Examples

set.seed(12345) net <- netgen() sample <- netsampler(net) ## Precompute `module_sizes` for replicate sampling of the same network: library(igraph) modules <- cluster_edge_betweenness(as.undirected(net)) module_sizes <- vapply(igraph::groups(modules), length, integer(1)) sample <- netsampler(net, module_sizes = module_sizes)
  • Maintainer: Carl Boettiger
  • License: GPL-3
  • Last published: 2023-08-27