split_pilot_set function

Split data into pilot and analysis sets

Split data into pilot and analysis sets

Given a data set and some parameters about how to split the data, this function partitions the data accordingly and returns the partitioned data as a list containing the analysis_set and pilot_set.

split_pilot_set( data, treat, pilot_fraction = 0.1, pilot_size = NULL, group_by_covariates = NULL )

Arguments

  • data: data.frame with observations as rows, features as columns

  • treat: string giving the name of column designating treatment assignment

  • pilot_fraction: numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

  • pilot_size: alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates

    is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

  • group_by_covariates: character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

Returns

a list with analaysis_set and pilot_set

Examples

dat <- make_sample_data() splt <- split_pilot_set(dat, "treat", 0.2) # can be passed into auto_stratify if desired a.strat <- auto_stratify(splt$analysis_set, "treat", outcome ~ X1, pilot_sample = splt$pilot_set )
  • Maintainer: Rachael C. Aikens
  • License: GPL-3
  • Last published: 2022-03-31