generate_syn_data function

Generate synthetic data for the CausalGPS package

Generate synthetic data for the CausalGPS package

Generates synthetic data set based on different GPS models and covariates.

generate_syn_data( sample_size = 1000, outcome_sd = 10, gps_spec = 1, cova_spec = 1, vectorized_y = FALSE )

Arguments

  • sample_size: A positive integer number that represents a number of data samples.

  • outcome_sd: A positive double number that represents standard deviation used to generate the outcome in the synthetic data set.

  • gps_spec: A numerical integer values ranging from 1 to 7. The complexity and form of the relationship between covariates and treatment variables are determined by the gps_spec. Below, you will find a concise definition for each of these values:

    • gps_spec: 1: The treatment is generated using a normal distributionMay 24, 2023 (stats::rnorm) and a linear function of covariates (cf1 to cf6).
    • gps_spec: 2: The treatment is generated using a Student's t-distribution (stats::rt) and a linear function of covariates, but is also truncated to be within a specific range (-5 to 25).
    • gps_spec: 3: The treatment includes a quadratic term for the third covariate.
    • gps_spec: 4: The treatment is calculated using an exponential function within a fraction, creating logistic-like model.
    • gps_spec: 5: The treatment also uses logistic-like model but with different parameters.
    • gps_spec: 6: The treatment is calculated using the natural logarithm of the absolute value of a linear combination of the covariates.
    • gps_spec: 7: The treatment is generated similarly to gps_spec = 2, but without truncation.
  • cova_spec: A numerical value (1 or 2) to modify the covariates. It determines how the covariates in the synthetic data set are transformed. If cova_spec equals 2, the function applies non-linear transformation to the covariates, which can add complexity to the relationships between covariates and outcomes in the synthetic data. See the code for more details.

  • vectorized_y: A Boolean value indicates how Y internally is generated. (Default = FALSE). This parameter is introduced for backward compatibility. vectorized_y = TRUE performs better.

Returns

synthetic_data: The function returns a data.frame saved the constructed synthetic data.

Examples

set.seed(298) s_data <- generate_syn_data(sample_size = 100, outcome_sd = 10, gps_spec = 1, cova_spec = 1)