stdGlm function

Regression standardization in generalized linear models

Regression standardization in generalized linear models

stdGlm performs regression standardization in generalized linear models, at specified values of the exposure, over the sample covariate distribution. Let YY, XX, and ZZ be the outcome, the exposure, and a vector of covariates, respectively. stdGlm uses a fitted generalized linear model to estimate the standardized mean θ(x)=E{E(YX=x,Z)}\theta(x)=E\{E(Y|X=x,Z)\}, where xx is a specific value of XX, and the outer expectation is over the marginal distribution of ZZ.

stdGlm(fit, data, X, x, clusterid, case.control = FALSE, subsetnew)

Arguments

  • fit: an object of class "glm", as returned by the glm function in the stats package. If arguments weights and/or subset

    are used when fitting the model, then the same weights and subset are used in stdGlm.

  • data: a data frame containing the variables in the model. This should be the same data frame as was used to fit the model in fit.

  • X: a string containing the name of the exposure variable XX in data.

  • x: an optional vector containing the specific values of XX at which to estimate the standardized mean. If XX is binary (0/1) or a factor, then x defaults to all values of XX. If XX is numeric, then x defaults to the mean of XX. If x is set to NA, then XX is not altered. This produces an estimate of the marginal mean E(Y)=E{E(YX,Z)}E(Y)=E\{E(Y|X,Z)\}.

  • clusterid: an optional string containing the name of a cluster identification variable when data are clustered.

  • case.control: logical. Do data come from a case-control study? Defaults to FALSE.

  • subsetnew: an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model.

Details

stdGlm assumes that a generalized linear model

η{E(YX,Z)}=h(X,Z;β) \eta\{E(Y|X,Z)\}=h(X,Z;\beta)

has been fitted. The maximum likelihood estimate of β\beta is used to obtain estimates of the mean E(YX=x,Z)E(Y|X=x,Z):

E^(YX=x,Z)=η1{h(X=x,Z;β^)}. \hat{E}(Y|X=x,Z)=\eta^{-1}\{h(X=x,Z;\hat{\beta})\}.

For each xx in the x argument, these estimates are averaged across all subjects (i.e. all observed values of ZZ) to produce estimates

θ^(x)=i=1nE^(YX=x,Zi)/n, \hat{\theta}(x)=\sum_{i=1}^n \hat{E}(Y|X=x,Z_i)/n,

where ZiZ_i is the value of ZZ for subject ii, i=1,...,ni=1,...,n. The variance for θ^(x)\hat{\theta}(x) is obtained by the sandwich formula.

Returns

An object of class "stdGlm" is a list containing - call: the matched call.

  • input: input is a list containing all input arguments.

  • est: a vector with length equal to length(x), where element j is equal to θ^\hat{\theta}(x[j]).

  • vcov: a square matrix with length(x) rows, where the element on row i and column j is the (estimated) covariance of θ^\hat{\theta}(x[i]) and θ^\hat{\theta}(x[j]).

References

Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.

Sjolander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31 (6), 563-574.

Sjolander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33 (9), 847-858.

Author(s)

Arvid Sjolander.

Note

The variance calculation performed by stdGlm does not condition on the observed covariates Zˉ=(Z1,...,Zn)\bar{Z}=(Z_1,...,Z_n). To see how this matters, note that

var{θ^(x)}=E[var{θ^(x)Zˉ}]+var[E{θ^(x)Zˉ}]. var\{\hat{\theta}(x)\}=E[var\{\hat{\theta}(x)|\bar{Z}\}]+var[E\{\hat{\theta}(x)|\bar{Z}\}].

The usual parameter β\beta in a generalized linear model does not depend on Zˉ\bar{Z}. Thus, E(β^Zˉ)E(\hat{\beta}|\bar{Z}) is independent of Zˉ\bar{Z} as well (since E(β^Zˉ)=βE(\hat{\beta}|\bar{Z})=\beta), so that the term var[E{β^Zˉ}]var[E\{\hat{\beta}|\bar{Z}\}] in the corresponding variance decomposition for var(β^)var(\hat{\beta}) becomes equal to 0. However, θ(x)\theta(x) depends on Zˉ\bar{Z} through the average over the sample distribution for ZZ, and thus the term var[E{θ^(x)Zˉ}]var[E\{\hat{\theta}(x)|\bar{Z}\}] is not 0, unless one conditions on Zˉ\bar{Z}.

Examples

##Example 1: continuous outcome n <- 1000 Z <- rnorm(n) X <- rnorm(n, mean=Z) Y <- rnorm(n, mean=X+Z+0.1*X^2) dd <- data.frame(Z, X, Y) fit <- glm(formula=Y~X+Z+I(X^2), data=dd) fit.std <- stdGlm(fit=fit, data=dd, X="X", x=seq(-3,3,0.5)) print(summary(fit.std)) plot(fit.std) ##Example 2: binary outcome n <- 1000 Z <- rnorm(n) X <- rnorm(n, mean=Z) Y <- rbinom(n, 1, prob=(1+exp(X+Z))^(-1)) dd <- data.frame(Z, X, Y) fit <- glm(formula=Y~X+Z+X*Z, family="binomial", data=dd) fit.std <- stdGlm(fit=fit, data=dd, X="X", x=seq(-3,3,0.5)) print(summary(fit.std)) plot(fit.std)
  • Maintainer: Arvid Sjolander
  • License: LGPL (>= 3)
  • Last published: 2021-05-17

Useful links