load_expanded_data function

Method to read, subset and sample expanded data

Method to read, subset and sample expanded data

Description

load_expanded_data( object, p_control = NULL, period = NULL, subset_condition = NULL, seed = NULL ) ## S4 method for signature 'trial_sequence' load_expanded_data( object, p_control = NULL, period = NULL, subset_condition = NULL, seed = NULL )

Arguments

  • object: An object of class trial_sequence .

  • p_control: Probability of selecting a control, NULL for no sampling (default).

  • period: An integerish vector of non-zero length to select trial period(s) or NULL (default) to select all trial periods.

  • subset_condition: A string or NULL (default). subset_condition will be translated to a call (in case the expanded data is saved as a data.table or in the csv format) or to a SQL-query (in case the expanded data is saved as a duckdb file).

    The operators "==", "!=", ">", ">=", "<", "<=", %in%", "&", "|" are supported. Numeric vectors can be written as c(1, 2, 3) or 1:3. Variables are not supported.

    Note: Make sure numeric vectors written as 1:3 are surrounded by spaces, e.g. a %in% c( 1:4 , 6:9 ), otherwise the code will fail.

  • seed: An integer seed or NULL (default).

    Note: The same seed will return a different result depending on the class of the te_datastore

    object contained in the trial_sequence object.

Returns

An updated trial_sequence object, the data is stored in slot @outcome_data

as a te_outcome_data object.

Details

This method is used on trial_sequence objects to read, subset and sample expanded data.

Examples

# create a trial_sequence-class object trial_itt_dir <- file.path(tempdir(), "trial_itt") dir.create(trial_itt_dir) trial_itt <- trial_sequence(estimand = "ITT") |> set_data(data = data_censored) |> set_outcome_model(adjustment_terms = ~ x1 + x2) trial_itt_csv <- set_expansion_options( trial_itt, output = save_to_csv(file.path(trial_itt_dir, "trial_csvs")), chunk_size = 500 ) |> expand_trials() # load_expanded_data default behaviour returns all trial_periods and doesn't sample load_expanded_data(trial_itt_csv) # load_expanded_data can subset the data before sampling load_expanded_data( trial_itt_csv, p_control = 0.2, period = 1:20, subset_condition = "followup_time %in% 1:20 & x2 < 1", ) # delete after use unlink(trial_itt_dir, recursive = TRUE)