Internal function to execute the subsampling component of the stochastic stagewise approach. If a user provides a stochastic
value between 0 and 1, it is assumed that some proportion of subsampling is desired. The samplingDistCalculation function calculates the distribution of the clusters and the subsample function uses that distribution to draw the actual subsample.
sampleDist: A vector whose length is equal to the number of clusters that indicates the probability of sampling each cluster
sampleSize: A scalar value indicating how larger of a subsample is being drawn
withReplacement: A logical value indicating whether the subsampling is beign done with or without replacement
clusterIDs: A vector of all of the UNIQUE cluster IDs
clusterID: A vector of length equal to the number of observations indicating which cluster each observation is in
Returns
A list with two variables: subSampleIndicator, which indicates which observations are in the current subsample, and clusterIDCurr, which indicates the clusterID for the subsample.
Note
Internal function.
While most of the subsample can be determined from the subSampleIndicator, the clusterIDCurr value has to be constructed inside the subsample function as the way the cluster IDs is handled is different depending o n whether we are sampling with or without replacement.