subsample function

subsample

subsample

Internal function to execute the subsampling component of the stochastic stagewise approach. If a user provides a stochastic

value between 0 and 1, it is assumed that some proportion of subsampling is desired. The samplingDistCalculation function calculates the distribution of the clusters and the subsample function uses that distribution to draw the actual subsample.

subsample(sampleDist, sampleSize, withReplacement, clusterIDs, clusterID)

Arguments

  • sampleDist: A vector whose length is equal to the number of clusters that indicates the probability of sampling each cluster
  • sampleSize: A scalar value indicating how larger of a subsample is being drawn
  • withReplacement: A logical value indicating whether the subsampling is beign done with or without replacement
  • clusterIDs: A vector of all of the UNIQUE cluster IDs
  • clusterID: A vector of length equal to the number of observations indicating which cluster each observation is in

Returns

A list with two variables: subSampleIndicator, which indicates which observations are in the current subsample, and clusterIDCurr, which indicates the clusterID for the subsample.

Note

Internal function.

While most of the subsample can be determined from the subSampleIndicator, the clusterIDCurr value has to be constructed inside the subsample function as the way the cluster IDs is handled is different depending o n whether we are sampling with or without replacement.

Author(s)

Gregory Vaughan

  • Maintainer: Gregory Vaughan
  • License: GPL (>= 3)
  • Last published: 2018-01-08

Useful links