AFglm function

Attributable fraction estimation based on a logistic regression model from a glm object (commonly used for cross-sectional or case-control sampling designs).

Attributable fraction estimation based on a logistic regression model from a glm object (commonly used for cross-sectional or case-control sampling designs).

AFglm estimates the model-based adjusted attributable fraction for data from a logistic regression model in the form of a glm object. This model is commonly used for data from a cross-sectional or non-matched case-control sampling design.

AFglm(object, data, exposure, clusterid, case.control = FALSE)

Arguments

  • object: a fitted logistic regression model object of class "glm".
  • data: an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment (formula), typically the environment from which the function is called.
  • exposure: the name of the exposure variable as a string. The exposure must be binary (0/1) where unexposed is coded as 0.
  • clusterid: the name of the cluster identifier variable as a string, if data are clustered. Cluster robust standard errors will be calculated.
  • case.control: can be set to TRUE if the data is from a non-matched case control study. By default case.control is set to FALSE which is used for cross-sectional sampling designs.

Returns

  • AF.est: estimated attributable fraction.

  • AF.var: estimated variance of AF.est. The variance is obtained by combining the delta method with the sandwich formula.

  • P.est: estimated factual proportion of cases; Pr(Y=1)Pr(Y=1). Returned by default when case.control = FALSE.

  • P.var: estimated variance of P.est. The variance is obtained by the sandwich formula. Returned by default when case.control = FALSE.

  • P0.est: estimated counterfactual proportion of cases if exposure would be eliminated; Pr(Y0=1)Pr(Y0=1). Returned by default when case.control = FALSE.

  • P0.var: estimated variance of P0.est. The variance is obtained by the sandwich formula. Returned by default when case.control = FALSE.

  • log.or: a vector of the estimated log odds ratio for every individual. log.or contains the estimated coefficient for the exposure variable X for every level of the confounder Z as specified by the user in the formula. If the model to be estimated is

logit{Pr(Y=1X,Z)}=α+βX+γZlogitPr(Y=1X,Z)=α+βX+γZ logit\{Pr(Y=1|X,Z)\} = \alpha+\beta{X}+\gamma{Z}logit {Pr(Y=1|X,Z)} = \alpha + \beta X + \gamma Z
then `log.or` is the estimate of $\beta$. If the model to be estimated is 
logit{Pr(Y=1X,Z)}=α+βX+γZ+ψXZlogitPr(Y=1X,Z)=α+βX+γZ+ψXZ logit\{Pr(Y=1|X,Z)\}=\alpha+\beta{X}+\gamma{Z}+\psi{XZ}logit{Pr(Y=1|X,Z)} = \alpha + \beta X +\gamma Z +\psi XZ
then `log.odds` is the estimate of $\beta + \psi Z$. Only returned if argument `case.control` is set to `TRUE`.

Details

AFglm estimates the attributable fraction for a binary outcome Y

under the hypothetical scenario where a binary exposure X is eliminated from the population. The estimate is adjusted for confounders Z by logistic regression using the (glm) function. The estimation strategy is different for cross-sectional and case-control sampling designs even if the underlying logististic regression model is the same. For cross-sectional sampling designs the AF can be defined as

AF=1Pr(Y0=1)Pr(Y=1)AF=1Pr(Y0=1)/Pr(Y=1) AF=1-\frac{Pr(Y_0=1)}{Pr(Y=1)}AF = 1 - Pr(Y0 = 1) / Pr(Y = 1)

where Pr(Y0=1)Pr(Y0 = 1) denotes the counterfactual probability of the outcome if the exposure would have been eliminated from the population and Pr(Y=1)Pr(Y = 1) denotes the factual probability of the outcome. If Z is sufficient for confounding control, then Pr(Y0=1)Pr(Y0 = 1) can be expressed as EzPr(Y=1X=0,Z).E_z{Pr(Y = 1 |X = 0,Z)}.

The function uses logistic regression to estimate Pr(Y=1X=0,Z)Pr(Y=1|X=0,Z), and the marginal sample distribution of Z

to approximate the outer expectation ( and Vansteelandt, 2012). For case-control sampling designs the outcome prevalence is fixed by sampling design and absolute probabilities (P.est and P0.est) can not be estimated. Instead adjusted log odds ratios (log.or) are estimated for each individual. This is done by setting case.control to TRUE. It is then assumed that the outcome is rare so that the risk ratio can be approximated by the odds ratio. For case-control sampling designs the AF be defined as (Bruzzi et. al)

AF=1Pr(Y0=1)Pr(Y=1)AF=1Pr(Y0=1)/Pr(Y=1) AF = 1 - \frac{Pr(Y_0=1)}{Pr(Y = 1)}AF = 1 - Pr(Y0 = 1) / Pr(Y = 1)

where Pr(Y0=1)Pr(Y0 = 1) denotes the counterfactual probability of the outcome if the exposure would have been eliminated from the population. If Z is sufficient for confounding control then the probability Pr(Y0=1)Pr(Y0 = 1) can be expressed as

Pr(Y0=1)=EZ{Pr(Y=1X=0,Z)}.Pr(Y0=1)=EzPr(Y=1X=0,Z). Pr(Y_0=1)=E_Z\{Pr(Y=1\mid{X}=0,Z)\}.Pr(Y0=1) = E_z{Pr(Y = 1 | X = 0, Z)}.

Using Bayes' theorem this implies that the AF can be expressed as

AF=1EZ{Pr(Y=1X=0,Z)}Pr(Y=1)=1EZ{RRX(Z)Y=1}AF=1EzPr(Y=1X=0,Z)/Pr(Y=1)=1EzRRX(Z)Y=1 AF = 1-\frac{E_Z\{Pr(Y=1\mid X=0,Z)\}}{Pr(Y=1)}=1-E_Z\{RR^{-X}(Z)\mid{Y = 1}\}AF = 1 - E_z{Pr( Y = 1 | X = 0, Z)} / Pr(Y = 1) = 1 - E_z{RR^{-X} (Z) | Y = 1}

where RR(Z)RR(Z) is the risk ratio

Pr(Y=1X=1,Z)Pr(Y=1X=0,Z).Pr(Y=1X=1,Z)/Pr(Y=1X=0,Z). \frac{Pr(Y=1\mid{X=1,Z})}{Pr(Y=1\mid{X=0,Z})}.Pr(Y = 1 | X = 1,Z)/Pr(Y=1 | X = 0, Z).

Moreover, the risk ratio can be approximated by the odds ratio if the outcome is rare. Thus,

AF1EZ{ORX(Z)Y=1}.AFisapproximately1EzORX(Z)Y=1. AF \approx 1 - E_Z\{OR^{-X}(Z)\mid{Y = 1}\}.AF is approximately 1 - E_z{OR^{-X}(Z) | Y = 1}.

If clusterid is supplied, then a clustered sandwich formula is used in all variance calculations.

Examples

# Simulate a cross-sectional sample expit <- function(x) 1 / (1 + exp( - x)) n <- 1000 Z <- rnorm(n = n) X <- rbinom(n = n, size = 1, prob = expit(Z)) Y <- rbinom(n = n, size = 1, prob = expit(Z + X)) # Example 1: non clustered data from a cross-sectional sampling design data <- data.frame(Y, X, Z) # Fit a glm object fit <- glm(formula = Y ~ X + Z + X * Z, family = binomial, data = data) # Estimate the attributable fraction from the fitted logistic regression AFglm_est <- AFglm(object = fit, data = data, exposure = "X") summary(AFglm_est) # Example 2: clustered data from a cross-sectional sampling design # Duplicate observations in order to create clustered data id <- rep(1:n, 2) data <- data.frame(id = id, Y = c(Y, Y), X = c(X, X), Z = c(Z, Z)) # Fit a glm object fit <- glm(formula = Y ~ X + Z + X * Z, family = binomial, data = data) # Estimate the attributable fraction from the fitted logistic regression AFglm_clust <- AFglm(object = fit, data = data, exposure = "X", clusterid = "id") summary(AFglm_clust) # Example 3: non matched case-control # Simulate a sample from a non matched case-control sampling design # Make the outcome a rare event by setting the intercept to -6 expit <- function(x) 1 / (1 + exp( - x)) NN <- 1000000 n <- 500 intercept <- -6 Z <- rnorm(n = NN) X <- rbinom(n = NN, size = 1, prob = expit(Z)) Y <- rbinom(n = NN, size = 1, prob = expit(intercept + X + Z)) population <- data.frame(Z, X, Y) Case <- which(population$Y == 1) Control <- which(population$Y == 0) # Sample cases and controls from the population case <- sample(Case, n) control <- sample(Control, n) data <- population[c(case, control), ] # Fit a glm object fit <- glm(formula = Y ~ X + Z + X * Z, family = binomial, data = data) # Estimate the attributable fraction from the fitted logistic regression AFglm_est_cc <- AFglm(object = fit, data = data, exposure = "X", case.control = TRUE) summary(AFglm_est_cc)

References

Bruzzi, P., Green, S. B., Byar, D., Brinton, L. A., and Schairer, C. (1985). Estimating the population attributable risk for multiple risk factors using case-control data. American Journal of Epidemiology 122 , 904-914.

Greenland, S. and Drescher, K. (1993). Maximum Likelihood Estimation of the Attributable Fraction from logistic Models. Biometrics 49 , 865-872.

, A. and Vansteelandt, S. (2011). Doubly robust estimation of attributable fractions. Biostatistics 12 , 112-121.

See Also

glm used for fitting the logistic regression model. For conditional logistic regression (commonly for data from a matched case-control sampling design) see AFclogit.

Author(s)

Elisabeth Dahlqwist, Arvid

  • Maintainer: Elisabeth Dahlqwist
  • License: GPL-2 | GPL-3
  • Last published: 2019-05-20

Useful links