generate_data function

Data generation function

Data generation function

Function to generate data with n observations of a primary outcome Y, secondary outcome K, exposure X, and measured as well as unmeasured confounders L and U, where the primary outcome is a quantitative normally-distributed variable (setting = "GLM") or censored time-to-event outcome under an accelerated failure time (AFT) model (setting = "AFT"). Under the AFT setting, the observed time-to-event variable T=exp(Y)

as well as the censoring indicator C are also computed. X

is generated as a genetic exposure variable in the form of a single nucleotide variant (SNV) in 0-1-2 additive coding with minor allele frequency maf. X can be generated independently of U

(X_orth_U = TRUE) or dependent on U

(X_orth_U = FALSE). For more details regarding the underlying model, see the vignette.

generate_data(setting = "GLM", n = 1000, maf = 0.2, cens = 0.3, a = NULL, b = NULL, aXK = 0.2, aXY = 0.1, aXL = 0, aKY = 0.3, aLK = 0, aLY = 0, aUY = 0, aUL = 0, mu_X = NULL, sd_X = NULL, X_orth_U = TRUE, mu_U = 0, sd_U = 1, mu_K = 0, sd_K = 1, mu_L = 0, sd_L = 1, mu_Y = 0, sd_Y = 1)

Arguments

  • setting: String with value "GLM" or "AFT" indicating whether the primary outcome is generated as a normally-distributed quantitative outcome ("GLM") or censored time-to-event outcome ("AFT").

  • n: Numeric. Sample size.

  • maf: Numeric. Minor allele frequency of the genetic exposure variable.

  • cens: Numeric. Desired percentage of censored individuals and has to be specified under the AFT setting. Note that the actual censoring rate is generated through specification of the parameters a and b, and cens is mostly used as a check whether the desired censoring rate is obtained through a

    and b (otherwise, a warning is issued).

  • a: Integer for generating the desired censoring rate under the AFT setting. Has to be specified under the AFT setting.

  • b: Integer for generating the desired censoring rate under the AFT setting. Has to be specified under the AFT setting.

  • aXK: Numeric. Size of the effect of X on K.

  • aXY: Numeric. Size of the effect of X on Y.

  • aXL: Numeric. Size of the effect of X on L.

  • aKY: Numeric. Size of the effect of K on Y.

  • aLK: Numeric. Size of the effect of L on K.

  • aLY: Numeric. Size of the effect of L on Y.

  • aUY: Numeric. Size of the effect of U on Y.

  • aUL: Numeric. Size of the effect of U on L.

  • mu_X: Numeric. Expected value of X.

  • sd_X: Numeric. Standard deviation of X.

  • X_orth_U: Logical. Indicator whether X should be generated independently of U (X_orth_U = TRUE) or dependent on U (X_orth_U = FALSE).

  • mu_U: Numeric. Expected value of U.

  • sd_U: Numeric. Standard deviation of U.

  • mu_K: Numeric. Expected value of K.

  • sd_K: Numeric. Standard deviation of K.

  • mu_L: Numeric. Expected value of L.

  • sd_L: Numeric. Standard deviation of L.

  • mu_Y: Numeric. Expected value of Y.

  • sd_Y: Numeric. Standard deviation of Y.

Returns

A dataframe containing n observations of the variables Y, K, X, L, U. Under the AFT setting, T=exp(Y) and the censoring indicator C (0 = censored, 1 = uncensored) are also computed.

Examples

# Generate data under the GLM setting with default values dat_GLM <- generate_data() head(dat_GLM) # Generate data under the AFT setting with default values dat_AFT <- generate_data(setting = "AFT", a = 0.2, b = 4.75) head(dat_AFT)
  • Maintainer: Stefan Konigorski
  • License: GPL-2
  • Last published: 2018-03-19

Useful links