Perform stratified random sampling to balance outcomes
Perform stratified random sampling to balance outcomes
This function is used to perform stratified random sampling to balance outcomes among the shards.
stratrs(y, C=5, P=0)
Arguments
y: The binary/categorical/continuous outcome.
C: The number of shards to break the data set into.
P: For continuous data, we break the range into P segments via the quantiles. Specifying, P=20 seems to work reasonably well.
Details
To perform BART with large data sets, random sampling is employed to break the data into C shards. Each shard should be balanced with respect to the outcome. For binary/categorical outcomes, stratified random sampling is employed with this function.
Returns
A vector is returned with each element assigned to a shard.