Function for performing simple or Dirichlet resampling
Function for performing simple or Dirichlet resampling
The function may be used for standard bootstrapping or for subsampling, see [1]. This function allows samples to be drawn with or without replacement, by groups and with or without Dirichlet weights, see [2]. This provides a variety of options for researchers who wish to correct sample biases, estimate empirical confidence intervals, and/or subsample large data sets.
in_data: The initial data frame that must be re-sampled. It must contain:
an ID variable
the variables of interest
a grouping variable
grp_vector: The grouping variable of the data frame, defined under the name 'group' for example
grp_matrix: A matrix that contains
the variable 'Group_ID' with entries all the available values of grouping variable
the variable 'Resample_Size' with the sizes for each sample that will be created per grouping value
replace: A logical input: TRUE/FALSE if replacement should be used or not, respectively
option: A character input with next possible values
"Simple", if we want to perform a simple re-sampling
"Dirichlet", if we want to perform a Dirichlet weighted re-sampling
number_samples: The number of samples to be created. If it is greater than one, then parallel processing is used.
nworkers: The number of logical processors that will be used for parallel computing (usually it is the double of available physical cores)
rseed: The random seed that will be used for sampling. Useful for reproducible results
Returns
It returns a list of mumber_samples data frames with exactly the same variables as the initial one, except that group variable has now only the given value from input data frame.
References
[1] D. N. Politis, J. P. Romano, M. Wolf, Subsampling (Springer-Verlag, New York, 1999).