stdGlm() R function from [stdReg]

Regression standardization in generalized linear models

stdGlm performs regression standardization in generalized linear models, at specified values of the exposure, over the sample covariate distribution. Let $Y$ , $X$ , and $Z$ be the outcome, the exposure, and a vector of covariates, respectively. stdGlm uses a fitted generalized linear model to estimate the standardized mean $\theta(x)=E\{E(Y|X=x,Z)\}$ , where $x$ is a specific value of $X$ , and the outer expectation is over the marginal distribution of $Z$ .


stdGlm(fit, data, X, x, clusterid, case.control = FALSE, subsetnew)

Arguments

fit: an object of class "glm", as returned by the glm function in the stats package. If arguments weights and/or subset

are used when fitting the model, then the same weights and subset are used in stdGlm.
data: a data frame containing the variables in the model. This should be the same data frame as was used to fit the model in fit.
X: a string containing the name of the exposure variable $X$ in data.
x: an optional vector containing the specific values of $X$ at which to estimate the standardized mean. If $X$ is binary (0/1) or a factor, then x defaults to all values of $X$ . If $X$ is numeric, then x defaults to the mean of $X$ . If x is set to NA, then $X$ is not altered. This produces an estimate of the marginal mean $E(Y)=E\{E(Y|X,Z)\}$ .
clusterid: an optional string containing the name of a cluster identification variable when data are clustered.
case.control: logical. Do data come from a case-control study? Defaults to FALSE.
subsetnew: an optional logical statement specifying a subset of observations to be used in the standardization. This set is assumed to be a subset of the subset (if any) that was used to fit the regression model.

Details

stdGlm assumes that a generalized linear model

\eta\{E(Y|X,Z)\}=h(X,Z;\beta)

has been fitted. The maximum likelihood estimate of $\beta$ is used to obtain estimates of the mean $E(Y|X=x,Z)$ :

\hat{E}(Y|X=x,Z)=\eta^{-1}\{h(X=x,Z;\hat{\beta})\}.

For each $x$ in the x argument, these estimates are averaged across all subjects (i.e. all observed values of $Z$ ) to produce estimates

\hat{\theta}(x)=\sum_{i=1}^n \hat{E}(Y|X=x,Z_i)/n,

where $Z_i$ is the value of $Z$ for subject $i$ , $i=1,...,n$ . The variance for $\hat{\theta}(x)$ is obtained by the sandwich formula.

Returns

An object of class "stdGlm" is a list containing - call: the matched call.

input: input is a list containing all input arguments.
est: a vector with length equal to length(x), where element j is equal to $\hat{\theta}$ (x[j]).
vcov: a square matrix with length(x) rows, where the element on row i and column j is the (estimated) covariance of $\hat{\theta}$ (x[i]) and $\hat{\theta}$ (x[j]).

References

Rothman K.J., Greenland S., Lash T.L. (2008). Modern Epidemiology, 3rd edition. Lippincott, Williams & Wilkins.

Sjolander A. (2016). Regression standardization with the R-package stdReg. European Journal of Epidemiology 31 (6), 563-574.

Sjolander A. (2016). Estimation of causal effect measures with the R-package stdReg. European Journal of Epidemiology 33 (9), 847-858.

Author(s)

Arvid Sjolander.

Note

The variance calculation performed by stdGlm does not condition on the observed covariates $\bar{Z}=(Z_1,...,Z_n)$ . To see how this matters, note that

var\{\hat{\theta}(x)\}=E[var\{\hat{\theta}(x)|\bar{Z}\}]+var[E\{\hat{\theta}(x)|\bar{Z}\}].

The usual parameter $\beta$ in a generalized linear model does not depend on $\bar{Z}$ . Thus, $E(\hat{\beta}|\bar{Z})$ is independent of $\bar{Z}$ as well (since $E(\hat{\beta}|\bar{Z})=\beta$ ), so that the term $var[E\{\hat{\beta}|\bar{Z}\}]$ in the corresponding variance decomposition for $var(\hat{\beta})$ becomes equal to 0. However, $\theta(x)$ depends on $\bar{Z}$ through the average over the sample distribution for $Z$ , and thus the term $var[E\{\hat{\theta}(x)|\bar{Z}\}]$ is not 0, unless one conditions on $\bar{Z}$ .

Examples


##Example 1: continuous outcome
n <- 1000
Z <- rnorm(n)
X <- rnorm(n, mean=Z)
Y <- rnorm(n, mean=X+Z+0.1*X^2)
dd <- data.frame(Z, X, Y)
fit <- glm(formula=Y~X+Z+I(X^2), data=dd)
fit.std <- stdGlm(fit=fit, data=dd, X="X", x=seq(-3,3,0.5))
print(summary(fit.std))
plot(fit.std)

##Example 2: binary outcome
n <- 1000
Z <- rnorm(n)
X <- rnorm(n, mean=Z)
Y <- rbinom(n, 1, prob=(1+exp(X+Z))^(-1))
dd <- data.frame(Z, X, Y)
fit <- glm(formula=Y~X+Z+X*Z, family="binomial", data=dd)
fit.std <- stdGlm(fit=fit, data=dd, X="X", x=seq(-3,3,0.5))
print(summary(fit.std))
plot(fit.std)

stdReg package Read PDF manual

Maintainer: Arvid Sjolander
License: LGPL (>= 3)
Last published: 2021-05-17

Useful links

stdGlm function