tmleMSM function

Targeted Maximum Likelihood Estimation of Parameter of MSM

Targeted Maximum Likelihood Estimation of Parameter of MSM

Targeted maximum likelihood estimation of the parameter of a marginal structural model (MSM) for binary point treatment effects. The tmleMSM function is minimally called with arguments (Y,A,W, MSM), where Y is a continuous or binary outcome variable, A is a binary treatment variable, (A=1 for treatment, A=0 for control), and W is a matrix or dataframe of baseline covariates. MSM is a valid regression formula for regressing Y on any combination of A, V, W, T, where V defines strata and T represents the time at which repeated measures on subjects are made. Missingness in the outcome is accounted for in the estimation procedure if missingness indicator Delta is 0 for some observations. Repeated measures can be identified using the id argument. Observation weigths (sampling weights) may optionally be provided

tmleMSM(Y, A, W, V, T = rep(1,length(Y)), Delta = rep(1, length(Y)), MSM, v = NULL, Q = NULL, Qform = NULL, Qbounds = c(-Inf, Inf), Q.SL.library = c("SL.glm", "tmle.SL.dbarts2", "SL.glmnet"), cvQinit = TRUE, hAV = NULL, hAVform = NULL, g1W = NULL, gform = NULL, pDelta1 = NULL, g.Deltaform = NULL, g.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"), g.Delta.SL.library = c("SL.glm", "tmle.SL.dbarts.k.5", "SL.gam"), ub = sqrt(sum(Delta))* log(sum(Delta)) / 5, family = "gaussian", fluctuation = "logistic", alpha = 0.995, id = 1:length(Y), V.Q = 10, V.g = 10, V.Delta = 10, inference = TRUE, verbose = FALSE, Q.discreteSL = FALSE, g.discreteSL = FALSE, alpha.sig = 0.05, obsWeights = NULL)

Arguments

  • Y: continuous or binary outcome variable
  • A: binary treatment indicator, 1 - treatment, 0 - control
  • W: vector, matrix, or dataframe containing baseline covariates. Factors are not currently allowed.
  • V: vector, matrix, or dataframe of covariates used to define strata
  • T: optional time for repeated measures data
  • Delta: indicator of missing outcome or treatment assignment. 1 - observed, 0 - missing
  • MSM: MSM of interest, specified as valid right hand side of a regression formula (see examples)
  • v: optional value defining the strata of interest (V=vV=v) for stratified estimation of MSM parameter
  • Q: optional n×2n \times 2 matrix of initial values for QQ portion of the likelihood, (E(YA=0,W),E(YA=1,W))(E(Y|A=0,W), E(Y|A=1,W))
  • Qform: optional regression formula for estimation of E(YA,W)E(Y|A, W), suitable for call to glm
  • Qbounds: vector of upper and lower bounds on Y and predicted values for initial Q
  • Q.SL.library: optional vector of prediction algorithms to use for SuperLearner estimation of initial Q
  • cvQinit: logical, if TRUE, estimates cross-validated predicted values using discrete super learning, default=TRUE
  • hAV: optional n×2n \times 2 matrix used in numerator of weights for updating covariate and the influence curve. If unspecified, defaults to conditional probabilities P(A=1V)P(A=1|V) or P(A=1T)P(A=1|T), for repeated measures data. For unstabilized weights, pass in an n×2n \times 2 matrix of all 1s
  • hAVform: optionalregression formula of the form A~V+T, if specified this overrides the call to SuperLearner
  • g1W: optional vector of conditional treatment assingment probabilities, P(A=1W)P(A=1|W)
  • gform: optional regression formula of the form A~W, if specified this overrides the call to SuperLearner
  • pDelta1: optional n×2n \times 2 matrix of conditional probabilities for missingness mechanism,P(Delta=1A=0,V,W,T),P(Delta=1A=1,V,W,T)P(Delta=1|A=0,V,W,T), P(Delta=1|A=1,V,W,T).
  • g.Deltaform: optional regression formula of the form Delta~A+W, if specified this overrides the call to SuperLearner
  • g.SL.library: optional vector of prediction algorithms to use for SuperLearner estimation of g1W
  • g.Delta.SL.library: optional vector of prediction algorithms to use for SuperLearner estimation ofpDelta1
  • ub: upper bound on inverse probability weights. See Details section for more information
  • family: family specification for working regression models, generally gaussian for continuous outcomes (default), binomial for binary outcomes
  • fluctuation: logistic (default), or linear
  • alpha: used to keep predicted initial values bounded away from (0,1) for logistic fluctuation
  • id: optional subject identifier
  • V.Q: number of cross-validation folds for Super Learner estimation of Q
  • V.g: number of cross-validation folds for Super Learner estimation of g
  • V.Delta: number of cross-validation folds for Super Learner estimation of g_Delta
  • inference: if TRUE, variance-covariance matrix, standard errors, pvalues, and 95% confidence intervals are calculated. Setting to FALSE saves a little time when bootstrapping.
  • verbose: status messages printed if set to TRUE (default=FALSE)
  • Q.discreteSL: If true, use discrete SL to estimate Q, otherwise ensembleSL by default. Ignored when SL is not used.
  • g.discreteSL: If true, use discrete SL to estimate each component of g, otherwise ensembleSL by default. Ignored when SL is not used.
  • alpha.sig: significance level for constructing 1-alpha.sig confidence intervals
  • obsWeights: optional weights for biased sampling and two-stage designs.

Details

ub bounds the IC by bounding the factor h(A,V)/[g(A,V,W)P(Delta=1A,V,W)]h(A,V)/[g(A,V,W)P(Delta=1|A,V,W)] between 0 and ub, default value based on sample size.

Returns

  • psi: MSM parameter estimate

  • sigma: variance covariance matrix

  • se: standard errors extracted from sigma

  • pvalue: two-sided p-value

  • lb: lower bound on 95% confidence interval

  • ub: upper bound on 95% confidence interval

  • epsilon: fitted value of epsilon used to target initial Q

  • psi.Qinit: MSM parameter estimate based on untargeted initial Q

  • Qstar: targeted estimate of Q, an n×2n \times 2 matrix with predicted values for Q(0,W), Q(1,W) using the updated fit

  • Qinit: initial estimate of Q. Qinit$coef are the coefficients for a glm model for Q, if applicable. Qinit$Q is an n×2n \times 2 matrix, where n is the number of observations. Columns contain predicted values for Q(0,W),Q(1,W) using the initial fit. Qinit$type is method for estimating Q

  • g: treatment mechanism estimate. A list with three items: g$g1W contains estimates of P(A=1W)P(A=1|W) for each observation, g$coef the coefficients for the model for gg when glm used, g$type estimation procedure

  • g.AV: estimate for h(A,V) or h(A,T). A list with three items: g.AV$g1W an n×2n \times 2 matrix containing values of P(A=0V,T),P(A=1V,T)P(A=0|V,T), P(A=1|V,T) for each observation, g.AV$coef the coefficients for the model for gg when glm used, g.AV$type estimation procedure

  • g_Delta: missingness mechanism estimate. A list with three items: g_Delta$g1W an n×2n \times 2 matrix containing values of P(Delta=1A,V,W,T)P(Delta=1|A,V,W,T) for each observation, g_Delta$coef the coefficients for the model for gg when glm used, g_Delta$type estimation procedure

References

  1. Gruber, S. and van der Laan, M.J. (2012), tmle: An R Package for Targeted Maximum Likelihood Estimation. Journal of Statistical Software, 51(13), 1-35. https://www.jstatsoft.org/v51/i13/

  2. Rosenblum, M. and van der Laan, M.J. (2010), Targeted Maximum Likelihood Estimation of the Parameter of a Marginal Structural Model. The International Journal of Biostatistics,6(2), 2010.

  3. Gruber, S., Phillips, R.V., Lee, H., van der Laan, M.J. Data-Adaptive Selection of the Propensity Score Truncation Level for Inverse Probability Weighted and Targeted Maximum Likelihood Estimators of Marginal Point Treatment Effects. American Journal of Epidemiology 2022; 191(9), 1640-1651.

Author(s)

Susan Gruber sgruber@cal.berkeley.edu , in collaboration with Mark van der Laan.

See Also

summary.tmleMSM, estimateQ, estimateG, calcSigma, tmle

Examples

library(tmle) # Example 1. Estimating MSM parameter with correctly specified regression formulas # MSM: psi0 + psi1*A + psi2*V + psi3*A*V (saturated) # true parameter value: psi = (0, 1, -2, 0.5) # generate data set.seed(100) n <- 1000 W <- matrix(rnorm(n*3), ncol = 3) colnames(W) <- c("W1", "W2", "W3") V <- rbinom(n, 1, 0.5) A <- rbinom(n, 1, 0.5) Y <- rbinom(n, 1, plogis(A - 2*V + 0.5*A*V)) result.ex1 <- tmleMSM(Y, A, W, V, MSM = "A*V", Qform = "Y~.", gform = "A~1", hAVform = "A~1", family = "binomial") print(result.ex1) ## Not run: # Example 2. Biased sampling from example 1 population # (observations having V = 1 twice as likely to be included in the dataset retain.ex2 <- sample(1:n, size = n/2, p = c(1/3 + 1/3*V)) wt.ex2 <- 1/(1/3 + 1/3*V) result.ex2 <- tmleMSM(Y[retain.ex2], A[retain.ex2], W[retain.ex2,], V[retain.ex2], MSM = "A*V", Qform = "Y~.", gform = "A~1", hAVform = "A~1", family = "binomial", obsWeight = wt.ex2[retain.ex2]) print(result.ex2) # Example 3. Repeated measures data, two observations per id # (e.g., crossover study design) # MSM: psi0 + psi1*A + psi2*V + psi3*V^2 + psi4*T # true parameter value: psi = (-2, 1, 0, -2, 0 ) # generate data in wide format (id, W1, Y(t), W2(t), V(t), A(t)) set.seed(10) n <- 250 id <- rep(1:n) W1 <- rbinom(n, 1, 0.5) W2.1 <- rnorm(n) W2.2 <- rnorm(n) V.1 <- rnorm(n) V.2 <- rnorm(n) A.1 <- rbinom(n, 1, plogis(0.5 + 0.3 * W2.1)) A.2 <- 1-A.1 Y.1 <- -2 + A.1 - 2*V.1^2 + W2.1 + rnorm(n) Y.2 <- -2 + A.2 - 2*V.2^2 + W2.2 + rnorm(n) d <- data.frame(id, W1, W2=W2.1, W2.2, V=V.1, V.2, A=A.1, A.2, Y=Y.1, Y.2) # change dataset from wide to long format longd <- reshape(d, varying = cbind(c(3, 5, 7, 9), c(4, 6, 8, 10)), idvar = "id", direction = "long", timevar = "T", new.row.names = NULL, sep = "") # misspecified model for initial Q, partial misspecification for g. # V set to 2 for Q and g to save time, not recommended at this sample size result.ex3 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T = longd$T, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2", id = longd$id, V.Q=2, V.g=2) print(result.ex3) # Example 4: Introduce 20 # V set to 2 for Q and g to save time, not recommended at this sample size Delta <- rbinom(nrow(longd), 1, 0.8) result.ex4 <- tmleMSM(Y = longd$Y, A = longd$A, W = longd[,c("W1", "W2")], V = longd$V, T=longd$T, Delta = Delta, MSM = "A + V + I(V^2) + T", Qform = "Y ~ A + V", gform = "A ~ W2", g.Deltaform = "Delta ~ 1", id=longd$id, verbose = TRUE, V.Q=2, V.g=2) print(result.ex4) ## End(Not run)