pilot_fraction: numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)
pilot_size: alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates
is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.
group_by_covariates: character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.
data: data.frame with observations as rows, features as columns