gMAP function

Meta-Analytic-Predictive Analysis for Generalized Linear Models

Meta-Analytic-Predictive Analysis for Generalized Linear Models

Meta-Analytic-Predictive (MAP) analysis for generalized linear models suitable for normal, binary, or Poisson data. Model specification and overall syntax follows mainly glm conventions.

gMAP( formula, family = gaussian, data, weights, offset, tau.strata, tau.dist = c("HalfNormal", "TruncNormal", "Uniform", "Gamma", "InvGamma", "LogNormal", "TruncCauchy", "Exp", "Fixed"), tau.prior, tau.strata.pred = 1, beta.prior, prior_PD = FALSE, REdist = c("normal", "t"), t.df = 5, contrasts = NULL, iter = getOption("RBesT.MC.iter", 6000), warmup = getOption("RBesT.MC.warmup", 2000), thin = getOption("RBesT.MC.thin", 4), init = getOption("RBesT.MC.init", 1), chains = getOption("RBesT.MC.chains", 4), cores = getOption("mc.cores", 1L) ) ## S3 method for class 'gMAP' print(x, digits = 3, probs = c(0.025, 0.5, 0.975), ...) ## S3 method for class 'gMAP' fitted(object, type = c("response", "link"), probs = c(0.025, 0.5, 0.975), ...) ## S3 method for class 'gMAP' coef(object, probs = c(0.025, 0.5, 0.975), ...) ## S3 method for class 'gMAP' as.matrix(x, ...) ## S3 method for class 'gMAP' summary( object, type = c("response", "link"), probs = c(0.025, 0.5, 0.975), ... )

Arguments

  • formula: the model formula describing the linear predictor and encoding the grouping; see details

  • family: defines data likelihood and link function (binomial, gaussian, or poisson)

  • data: optional data frame containing the variables of the model. If not found in data, the variables are taken from environment(formula).

  • weights: optional weight vector; see details below.

  • offset: offset term in statistical model used for Poisson data

  • tau.strata: sets the exchangability stratum per study. That is, it is expected that each study belongs to a single stratum. Default is to assign all studies to stratum 1. See section differential heterogeniety below.

  • tau.dist: type of prior distribution for tau; supported priors are HalfNormal (default), TruncNormal, Uniform, Gamma, InvGamma, LogNormal, TruncCauchy, Exp and Fixed.

  • tau.prior: parameters of prior distribution for tau; see section prior specification below.

  • tau.strata.pred: the index for the prediction stratum; default is 1.

  • beta.prior: mean and standard deviation for normal priors of regression coefficients, see section prior specification below.

  • prior_PD: logical to indicate if the prior predictive distribution should be sampled (no conditioning on the data). Defaults to FALSE.

  • REdist: type of random effects distribution. Normal

    (default) or t.

  • t.df: degrees of freedom if random-effects distribution is t.

  • contrasts: an optional list; See contrasts.arg from model.matrix.default.

  • iter: number of iterations (including warmup).

  • warmup: number of warmup iterations.

  • thin: period of saving samples.

  • init: positive number to specify uniform range on unconstrained space for random initialization. See stan.

  • chains: number of Markov chains.

  • cores: number of cores for parallel sampling of chains.

  • x, object: gMAP analysis object created by gMAP

    function

  • digits: number of displayed significant digits.

  • probs: defines quantiles to be reported.

  • ...: optional arguments are ignored

  • type: sets reported scale (response (default) or link).

Returns

The function returns a S3 object of type gMAP. See the methods section below for applicable functions to query the object.

Details

The meta-analytic-predictive (MAP) approach derives a prior from historical data using a hierarchical model. The statistical model is formulated as a generalized linear mixed model for binary, normal (with fixed σ\sigma) and Poisson endpoints:

yihθihf(yihθih)yihθih f(yihθih) y_{ih}|\theta_{ih} \sim f(y_{ih} | \theta_{ih})y_ih|\theta_ih ~ f(y_ih | \theta_ih)

Here, i=1,,Ni=1,\ldots,N is the index for observations, and h=1,,Hh=1,\ldots,H is the index for the grouping (usually studies). The model assumes the linear predictor for a transformed mean as

g(θih;xih,β)=xihβ+ϵhg(θih;xih,β)=xihβ+ϵh g(\theta_{ih}; x_{ih},\beta) = x_{ih} \, \beta + \epsilon_hg(\theta_ih; x_ih,\beta) = x_ih \beta + \epsilon_h

with xihx_ih being the row vector of kk covariates for observation ii. The variance component is assumed by default normal

ϵhN(0,τ2),h=1,,Hϵh N(0,τ2),h=1,...,H \epsilon_h \sim N(0,\tau^2), \qquad h=1,\ldots,H\epsilon_h ~ N(0,\tau^2), h=1,...,H

Lastly, the Bayesian implementation assumes independent normal priors for the kk regression coefficients and a prior for the between-group standard deviation τ\tau (see taud.dist

for available distributions).

The MAP prior will then be derived from the above model as the conditional distribution of θ\theta_* given the available data and the vector of covariates xx_*

defining the overall intercept

θx,y.θx,y. \theta_{\star}| x_{\star},y .\theta_*| x_*,y .

A simple and common case arises for one observation (summary statistic) per trial. For a normal endpoint, the model then simplifies to the standard normal-normal hierarchical model. In the above notation, i=h=1,,Hi=h=1,\ldots,H and

yhθhN(θh,sh2)yhθh N(θh,sh2) y_h|\theta_h \sim N(\theta_h,s_h^2)y_h|\theta_h ~ N(\theta_h,s_h^2) θh=μ+ϵhθh=μ+ϵh \theta_h = \mu + \epsilon_h\theta_h = \mu + \epsilon_h ϵhN(0,τ2),ϵh N(0,τ2), \epsilon_h \sim N(0,\tau^2),\epsilon_h ~ N(0,\tau^2),

where the more common μ\mu is used for the only (intercept) parameter β1\beta_1. Since there are no covariates, the MAP prior is simply Pr(θy1,,yH)Pr(\theta_* | y_1,\ldots,y_H).

The hierarchical model is a compromise between the two extreme cases of full pooling (τ=0\tau=0, full borrowing, no discounting) and no pooling (τ=\tau=\infty, no borrowing, stratification). The information content of the historical data grows with H (number of historical data items) indefinitely for full pooling whereas no information is gained in a stratified analysis. For a fixed τ\tau, the maximum effective sample size of the MAP prior is nn_\infty (H>H->\infty), which for a normal endpoint with fixed σ\sigma is

n=(τ2σ2)1,n=(τ2/σ2)1 n_\infty = \left(\frac{\tau^2}{\sigma^2}\right)^{-1},n_\infty = (\tau^2/\sigma^2)^-1

(Neuenschwander et al., 2010). Hence, the ratio τ/σ\tau/\sigma limits the amount of information a MAP prior is equivalent to. This allows for a classification of τ\tau

values in relation to σ\sigma, which is crucial to define a prior PτP_\tau. The following classification is useful in a clinical trial setting:

Heterogeneityτ/σ\tau/\sigmann_\infty
small0.0625256
moderate0.12564
substantial0.2516
large0.54
very large1.01

The above formula for nn_\infty assumes a known τ\tau. This is unrealistic as the between-trial heterogeneity parameter is often not well estimable, in particular if the number of trials is small (H small). The above table helps to specify a prior distribution for τ\tau appropriate for the given context which defines the crucial parameter σ\sigma. For binary and Poisson endpoints, normal approximations can be used to determine σ\sigma. See examples below for concrete cases.

The design matrix XX is defined by the formula for the linear predictor and is always of the form response ~ predictor | grouping, which follows glm

conventions. The syntax has been extended to include a specification of the grouping (for example study) factor of the data with a horizontal bar, |. The bar separates the optionally specified grouping level, i.e. in the binary endpoint case cbind(r, n-r) ~ 1 | study. By default it is assumed that each row corresponds to an individual group (for which an individual parameter is estimated). Specifics for the different endpoints are:

  • normal: family=gaussian assumes an identity link function. The response should be given as matrix with two columns with the first column being the observed mean value yihy_ih and the second column the standard error seihse_ih (of the mean). Additionally, it is recommended to specify with the weight argument the number of units which contributed to the (mean) measurement yihy_ih. This information is used to estimate σ\sigma.

  • binary: family=binomial assumes a logit link function. The response must be given as two-column matrix with number of responders rr (first column) and non-responders nrn-r (second column).

  • Poisson: family=poisson assumes a log link function. The response is a vector of counts. The total exposure times can be specified by an offset, which will be linearly added to the linear predictor. The offset can be given as part of the formula, y ~ 1 + offset(log(exposure))

     or as the `offset` argument to `gMAP`. Note that the exposure unit must be given as log-offset.
    

Methods (by generic)

  • print(gMAP): displays a summary of the gMAP analysis.

  • fitted(gMAP): returns the quantiles of the posterior shrinkage estimates for each data item used during the analysis of the given gMAP object.

  • coef(gMAP): returns the quantiles of the predictive distribution. User can choose with type if the result is on the response or the link scale.

  • as.matrix(gMAP): extracts the posterior sample of the model.

  • summary(gMAP): returns the summaries of a gMAP. analysis. Output is a gMAPsummary object, which is a list containing

    • tau: posterior summary of the heterogeneity standard deviation
    • beta: posterior summary of the regression coefficients
    • theta.pred: summary of the predictive distribution (given in dependence on the type argument either on response or link scale)
    • theta: posterior summary of the mean estimate (also depends on the type argument)

Differential Discounting

The above model assumes the same between-group standard deviation τ\tau, which implies that the data are equally relevant. This assumption can be relaxed to more than one τ\tau. That is,

ϵhN(0,τs(h)2)ϵh N(0,τs(h)2) \epsilon_h \sim N(0,\tau_{s(h)}^2)\epsilon_h ~ N(0,\tau_s(h)^2)

where s(h)s(h) assignes group hh to one of SS

between-group heterogeneity strata.

For example, in a situation with two randomized and four observational studies, one may want to assume τ1\tau_1 (for trials 1 and 2) and τ2\tau_2 (for trials 3-6) for the between-trial standard deviations of the control means. More heterogeneity (less relevance) for the observational studies can then be expressed by appropriate priors for τ1\tau_1 and τ2\tau_2. In this case, S=2S=2 and the strata assignments (see tau.strata argument) would be c("s(1)=s(2)=1,\ns(1)=s(2)=1,\n", "s(3)=ldots=s(6)=2s(3)=\\ldots=s(6)=2").

Prior Specification

The prior distribution for the regression coefficients β\beta

is normal.

  • If a single number is given, then this is used as the standard deviation and the default mean of 0 is used.
  • If a vector is given, it must be of the same length as number of covariates defined and is used as standard deviation.
  • If a matrix with a single row is given, its first row will be used as mean and the second row will be used as standard deviation for all regression coefficients.
  • Lastly, a two-column matrix (mean and standard deviation columns) with as many columns as regression coefficients can be given.

It is recommended to always specify a beta.prior. Per default a mean of 0 is set. The standard deviation is set to 2 for the binary case, to 100 * sd(y) for the normal case and to sd(log(y + 0.5 + offset)) for the Poisson case.

For the between-trial heterogeniety τ\tau prior, a dispersion parameter must always be given for each exchangeability stratum. For the different tau.prior distributions, two parameters are needed out of which one is set to a default value if applicable:

Prioraabbdefault
HalfNormalμ=0\mu = 0σ\sigma
TruncNormalμ\muσ\sigmaμ=0\mu = 0
Uniformaba = 0
Gammaα\alphaβ\beta
InvGammaα\alphaβ\beta
LogNormalμlog\mu_logσlog\sigma_log
TruncCauchyμ\muσ\sigmaμ=0\mu = 0
Expβ\beta0
Fixeda0

For a prior distribution with a default location parameter, a vector of length equal to the number of exchangability strata can be given. Otherwise, a two-column matrix with as many rows as exchangability strata must be given, except for a single τ\tau

stratum, for which a vector of length two defines the parameters a and b.

Random seed

The MAP analysis is performed using Markov-Chain-Monte-Carlo (MCMC) in rstan. MCMC is a stochastic algorithm. To obtain exactly reproducible results you must use the set.seed function before calling gMAP. See RBesT

overview page for global options on setting further MCMC simulation parameters.

Examples

## Setting up dummy sampling for fast execution of example ## Please use 4 chains and 20x more warmup & iter in practice .user_mc_options <- options(RBesT.MC.warmup=50, RBesT.MC.iter=100, RBesT.MC.chains=2, RBesT.MC.thin=1) # Binary data example 1 # Mean response rate is ~0.25. For binary endpoints # a conservative choice for tau is a HalfNormal(0,1) as long as # the mean response rate is in the range of 0.2 to 0.8. For # very small or large rates consider the n_infinity approach # illustrated below. # for exact reproducible results, the seed must be set set.seed(34563) map_AS <- gMAP(cbind(r, n - r) ~ 1 | study, family = binomial, data = AS, tau.dist = "HalfNormal", tau.prior = 1, beta.prior = 2 ) print(map_AS) # obtain numerical summaries map_sum <- summary(map_AS) print(map_sum) names(map_sum) # [1] "tau" "beta" "theta.pred" "theta" map_sum$theta.pred # graphical model checks (returns list of ggplot2 plots) map_checks <- plot(map_AS) # forest plot with shrinkage estimates map_checks$forest_model # density of MAP prior on response scale map_checks$densityThetaStar # density of MAP prior on link scale map_checks$densityThetaStarLink # obtain shrinkage estimates fitted(map_AS) # regression coefficients coef(map_AS) # finally fit MAP prior with parametric mixture map_mix <- mixfit(map_AS, Nc = 2) plot(map_mix)$mix # optionally select number of components automatically via AIC map_automix <- automixfit(map_AS) plot(map_automix)$mix # Normal example 2, see normal vignette # Prior considerations # The general principle to derive a prior for tau can be based on the # n_infinity concept as discussed in Neuenschwander et al., 2010. # This assumes a normal approximation which applies for the colitis # data set as: p_bar <- mean(with(colitis, r / n)) s <- round(1 / sqrt(p_bar * (1 - p_bar)), 1) # s is the approximate sampling standard deviation and a # conservative prior is tau ~ HalfNormal(0,s/2) tau_prior_sd <- s / 2 # Evaluate HalfNormal prior for tau tau_cat <- c( pooling = 0, small = 0.0625, moderate = 0.125, substantial = 0.25, large = 0.5, veryLarge = 1, stratified = Inf ) # Interval probabilites (basically saying we are assuming # heterogeniety to be smaller than very large) diff(2 * pnorm(tau_cat * s, 0, tau_prior_sd)) # Cumulative probabilities as 1-F 1 - 2 * (pnorm(tau_cat * s, 0, tau_prior_sd) - 0.5) ## Recover user set sampling defaults options(.user_mc_options)

References

Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ. Summarizing historical information on controls in clinical trials. Clin Trials. 2010; 7(1):5-18

Schmidli H, Gsteiger S, Roychoudhury S, O'Hagan A, Spiegelhalter D, Neuenschwander B. Robust meta-analytic-predictive priors in clinical trials with historical control information. Biometrics 2014;70(4):1023-1032.

Weber S, Li Y, Seaman III J.W., Kakizume T, Schmidli H. Applying Meta-Analytic Predictive Priors with the R Bayesian evidence synthesis tools. JSS 2021; 100(19):1-32

See Also

plot.gMAP, forest_plot, automixfit, predict.gMAP

  • Maintainer: Sebastian Weber
  • License: GPL (>= 3)
  • Last published: 2025-01-21