formula: an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted.
id: a vector which identifies the clusters. The length of "id" should be the same as the number of observations. Data are assumed to be sorted so that observations on a cluster are contiguous rows for all entities in the formula.
data: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which CRTgeeDR is called.
family: a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.)
corstr: a character string specifying the correlation structure. The following are permitted: '"independence"', '"exchangeable"', '"ar1"', '"unstructured"' and '"userdefined"'
Mv: for "m-dependent", the value for m
weights: A vector of weights for each observation. If an observation has weight 0, it is excluded from the calculations of any parameters. Observations with a NA anywhere (even in variables not included in the model) will be assigned a weight of 0.
aug: A list of vector (one for A=1 treated, one for A=0 control) for each observation representing E(Y|X,A=a).
pi.a: A number, Probability of treatment attribution P(A=1)
corr.mat: The correlation matrix for "fixed". Matrix should be symmetric with dimensions >= the maximum cluster size. If the correlation structure is "userdefined", then this is a matrix describing which correlations are the same.
init.beta: an optional vector with the initial values of beta. If not specified, then the intercept will be set to InvLink(mean(response)). init.beta must be specified if not using an intercept.
init.alpha: an optional scalar or vector giving the initial values for the correlation. If provided along with Mv>1 or unstructured correlation, then the user must ensure that the vector is of the appropriate length.
init.phi: an optional initial overdispersion parameter. If not supplied, initialized to 1.
scale.fix: if set to TRUE, then the scale parameter is fixed at the value of init.phi.
sandwich: if set to TRUE, the sandwich variance is provided together with the naive estimator of variance.
maxit: maximum number of iterations.
tol: tolerance in calculation of coefficients.
print.log: if set to TRUE, a report is printed.
typeweights: a character string specifying the weights implementation. The following are permitted: "GENMOD" for W1/2V−1W1/2, "WV" for V−1W
nameTRT: Name of the variable containing information for the treatment
model.weights: an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted for the propensity score. Must model the probability of being observed.
model.augmentation.trt: an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted for the ouctome model for the treated group (A=1).
model.augmentation.ctrl: an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted for the ouctome model for the control group (A=0).
stepwise.augmentation: if set to TRUE, a stepwise for the augmentation model is performed during the fit of the augmentation model for the OM
stepwise.weights: if set to TRUE, a stepwise for the propensity score is performed during the fit of the augmentation model for the OM
nameMISS: Name of the variable containing information for the Missing indicator
nameY: Name of the variable containing information for the outcome
sandwich.nuisance: if set to TRUE, the nuisance adjusted sandwich variance is provided.
fay.adjustment: if set to TRUE, the small-sample nuisance adjusted sandwich variance with Fay's adjustement is provided.
fay.bound: if set to 0.75 by default, bound value used for Fay's adjustement.
Returns
An object of type 'CRTgeeDR'
$beta Final values for regressors estimates
$phi scale parameter estimate
$alpha Final values for association parameters in the working correlation structure when exchangeable
$coefnames Name of the regressors in the main regression
$niter Number of iteration done by the algorithm before convergence
$converged convergence status
$var.naiv Variance of the estimates model based (naive)
$var Variance of the estimates sandwich
$var.nuisance Variance of the estimates nuisance adjusted sandwich
$var.fay Variance of the estimates nuisance adjusted sandwich with Fay correction for small samples
$call Call function
$corr Correlation structure used
$clusz Number of unit in each cluster
$FunList List of function associated with the family
$X design matrix for the main regression
$offset Offset specified in the regression
$eta predicted values
$weights Weights vector used in the diagonal term for the IPW
$ps.model Summary of the regression fitted for the PS if computed internally
$om.model.trt Summary of the regression fitted for the OM for treated if computed internally
$om.model.ctrl Summary of the regression fitted for the OM for control if computed internally
where Di=∂βT∂μi(β,Ai) is the design matrix, Vi is the covariance matrix equal to Ui1/2C(α)Ui1/2 with Ui a diagonal matrix with elements var(yij) and C(α) is the working correlation structure with non-diagonal terms α. Parameters α are estimated using simple moment estimators from the Pearson residuals. The matrix of weights Wi(Xi,Ai,ηW)=diag[Rij/πij(Xi,Ai,ηW)]j=1,…,ni, where πij(Xi,Ai,ηW)=P(Rij∣Xi,Ai) is the Propensity score (PS). The function B(Xi,Ai=a,ηB), which is called the Outcome Model (OM), is a function linking Yij with Xi and Ai. The ηB are nuisance parameters that are estimated. The estimator is most efficient if the OM is equal to E(Yi∣Xi,Ai=a)
The estimator denoted β^aug is found by solving the estimating equation. Although analytic solutions sometimes exist, coefficient estimates are generally obtained using an iterative procedure such as the Newton-Raphson method. Automatic implementation is such that, η^W in Wi(Xi,Ai,η^W) are obtained using a logistic regression and η^B in B(Xi,Ai,η^B) are obtained using a linear regression.
The variance of β^aug is estimated by the sandwich variance estimator. There are two external sources of variability that need to be accounted for: estimation of ηW for the PS and of ηB for the OM. We denote Ω=(β,ηW,ηB) the estimated parameters of interest and nuisance parameters. We can stack estimating functions and score functions for Ω:
where SiW and SiB represent the score equations for patients in cluster i for the estimation of ηW and ηB in the PS and the OM. A standard Taylor expansion paired with Slutzky's theorem and the central limit theorem give the sandwich estimator adjusted for nuisance parameters estimation in the OM and PS:
data(data.sim)## Not run:#### STANDARD GEE geeresults<-geeDREstimation(formula=OUTCOME~TRT, id="CLUSTER", data = data.sim, family ="binomial", corstr ="independence") summary(geeresults)#### IPW GEE ipwresults<-geeDREstimation(formula=OUTCOME~TRT, id="CLUSTER", data = data.sim, family ="binomial", corstr ="independence", model.weights=I(MISSING==0)~TRT*AGE) summary(ipwresults)#### AUGMENTED GEE augresults<-geeDREstimation(formula=OUTCOME~TRT, id="CLUSTER", data = data.sim, family ="binomial", corstr ="independence", model.augmentation.trt=OUTCOME~AGE, model.augmentation.ctrl=OUTCOME~AGE, stepwise.augmentation=FALSE) summary(augresults)## End(Not run)#### DOUBLY ROBUST drresults<-geeDREstimation(formula=OUTCOME~TRT, id="CLUSTER", data = data.sim, family ="binomial", corstr ="independence", model.weights=I(MISSING==0)~TRT*AGE, model.augmentation.trt=OUTCOME~AGE, model.augmentation.ctrl=OUTCOME~AGE, stepwise.augmentation=FALSE) summary(drresults)
Author(s)
Melanie Prague [based on R packages 'geeM' L. S. McDaniel, N. C. Henderson, and P. J. Rathouz. Fast Pure R Implementation of GEE: Application of the Matrix Package. The R Journal, 5(1):181-188, June 2013.]
References
Details regarding implementation can be found in
'Augmented GEE for improving efficiency and validity of estimation in cluster randomized trials by leveraging cluster-and individual-level covariates' - 2012 - Stephens A., Tchetgen Tchetgen E. and De Gruttola V. : Stat Med 31(10) - 915-930.
'Accounting for interactions and complex inter-subject dependency for estimating treatment effect in cluster randomized trials with missing at random outcomes' - 2015 - Prague M., Wang R., Stephens A., Tchetgen Tchetgen E. and De Gruttola V. : in revision.
'Fast Pure R Implementation of GEE: Application of the Matrix Package' - 2013 - McDaniel, Lee S and Henderson, Nicholas C and Rathouz, Paul J : The R Journal 5(1) - 181-197.
'Small-Sample Adjustments for Wald-Type Tests Using Sandwich Estimators' - 2001 - Fay, Michael P and Graubard, Barry I : Biometrics 57(4) - 1198-1206.