Bayesian Discount Prior: Two-Arm Linear Regression
Bayesian Discount Prior: Two-Arm Linear Regression
bdplm is used to estimate the treatment effect in the presence of covariates using the regression analysis implementation of the Bayesian discount prior. summary and print methods are supported. Currently, the function only supports a two-arm clinical trial where all of current treatment, current control, historical treatment, and historical control data are present
formula: an object of class "formula." See "Details" for more information, including specification of treatment data indicators.
data: a data frame containing the current data variables in the model. A column named treatment must be present; treatment must be binary and indicate treatment group vs. control group.
data0: a data frame containing the historical data variables in the model. The column labels of data and data0 must match.
prior_treatment_effect: scalar. The historical adjusted treatment effect. If left NULL, value is estimated from the historical data.
prior_control_effect: scalar. The historical adjusted control effect. If left NULL, value is estimated from the historical data.
prior_treatment_sd: scalar. The standard deviation of the historical adjusted treatment effect. If left NULL, value is estimated from the historical data.
prior_control_sd: scalar. The standard deviation of the historical adjusted control effect. If left NULL, value is estimated from the historical data.
prior_covariate_effect: vector. The prior mean(s) of the covariate effect(s). Default value is zero. If a single value is input, the the scalar is repeated to the length of the input covariates. Otherwise, care must be taken to ensure the length of the input matches the number of covariates.
prior_covariate_sd: vector. The prior standard deviation(s) of the covariate effect(s). Default value is 1e4. If a single value is input, the the scalar is repeated to the length of the input covariates. Otherwise, care must be taken to ensure the length of the input matches the number of covariates.
number_mcmc_alpha: scalar. Number of Monte Carlo simulations for estimating the historical data weight. Default is 5000.
number_mcmc_sigmagrid: scalar. Grid size for computing sigma. Default is 5000. See "Details" for more information.
number_mcmc_sigma: scalar. Number of Monte Carlo simulations for estimating sigma. Default is 1000. See "Details" for more information.
number_mcmc_beta: scalar. Number of Monte Carlo simulations for estimating beta, the vector of regression coefficients. Default is 10000.
discount_function: character. Specify the discount function to use. Currently supports weibull, scaledweibull, and identity. The discount function scaledweibull scales the output of the Weibull CDF to have a max value of 1. The identity
discount function uses the posterior probability directly as the discount weight. Default value is "identity".
alpha_max: scalar. Maximum weight the discount function can apply. Default is 1. Users may specify a vector of two values where the first value is used to weight the historical treatment group and the second value is used to weight the historical control group.
fix_alpha: logical. Fix alpha at alpha_max? Default value is FALSE.
weibull_scale: scalar. Scale parameter of the Weibull discount function used to compute alpha, the weight parameter of the historical data. Default value is 0.135. Users may specify a vector of two values where the first value is used to estimate the weight of the historical treatment group and the second value is used to estimate the weight of the historical control group. Not used when discount_function = "identity".
weibull_shape: scalar. Shape parameter of the Weibull discount function used to compute alpha, the weight parameter of the historical data. Default value is 3. Users may specify a vector of two values where the first value is used to estimate the weight of the historical treatment group and the second value is used to estimate the weight of the historical control group. Not used when discount_function = "identity".
method: character. Analysis method with respect to estimation of the weight paramter alpha. Default method "mc" estimates alpha for each Monte Carlo iteration. Alternate value "fixed" estimates alpha once and holds it fixed throughout the analysis. See the the bdplm
vignette
vignette("bdplm-vignette", package="bayesDP") for more details.
Returns
bdplm returns an object of class "bdplm".
An object of class "bdplm" is a list containing at least the following components:
posterior: data frame. The posterior draws of the covariates, the intercept, and the treatment effect. The grid of sigma values are included.
alpha_discount: vector. The posterior probability of the stochastic comparison between the current and historical data for each of the treatment and control arms. If method="mc", the result is a matrix of estimates, otherwise for method="fixed", the result is a vector of estimates.
estimates: list. The posterior means and standard errors of the intercept, treatment effect, covariate effect(s) and error variance.
Details
bdplm uses a two-stage approach for determining the strength of historical data in estimation of an adjusted mean or covariate effect. In the first stage, a discount function is used that that defines the maximum strength of the historical data and discounts based on disagreement with the current data. Disagreement between current and historical data is determined by stochastically comparing the respective posterior distributions under noninformative priors. Here with a two-arm regression analysis, the comparison is the proability (p) that the covariate effect of an historical data indicator is significantly different from zero. The comparison metric p is then input into the discount function and the final strength of the historical data is returned (alpha).
In the second stage, posterior estimation is performed where the discount function parameter, alpha, is used to weight the historical data effects.
The formula must include an intercept (i.e., do not use -1 in the formula) and both data and data0 must be present. The column names of data and data0 must match. See examples below for example usage.
The underlying model results in a marginal posterior distribution for the error variance sigma2 that does not have a known distribution. Thus, we use a grid search to draw samples of sigma2. First, the marginal posterior is evaluated at a grid of number_mcmc_sigmagrid
sigma2 values. The bounds of the grid are taken as approximately six standard deviations from an estimate of sigma2 using the lm
function. Next, number_mcmc_sigma posterior draws of sigma2
are made by sampling with replacement from the grid with each value having the corresponding marginal posterior probability of being selected. With posterior draws of sigma2 in hand, we can make posterior draws of the regression coefficients.
Examples
# Set sample sizesn_t <-30# Current treatment sample sizen_c <-30# Current control sample sizen_t0 <-80# Historical treatment sample sizen_c0 <-80# Historical control sample size# Treatment group vectors for current and historical datatreatment <- c(rep(1, n_t), rep(0, n_c))treatment0 <- c(rep(1, n_t0), rep(0, n_c0))# Simulate a covariate effect for current and historical datax <- rnorm(n_t + n_c,1,5)x0 <- rnorm(n_t0 + n_c0,1,5)# Simulate outcome:# - Intercept of 10 for current and historical data# - Treatment effect of 31 for current data# - Treatment effect of 30 for historical data# - Covariate effect of 3 for current and historical dataY <-10+31* treatment + x *3+ rnorm(n_t + n_c,0,5)Y0 <-10+30* treatment0 + x0 *3+ rnorm(n_t0 + n_c0,0,5)# Place data into separate treatment and control data frames and# assign historical = 0 (current) or historical = 1 (historical)df_ <- data.frame(Y = Y, treatment = treatment, x = x)df0 <- data.frame(Y = Y0, treatment = treatment0, x = x0)# Fit model using default settingsfit <- bdplm( formula = Y ~ treatment + x, data = df_, data0 = df0, method ="fixed")# Look at estimates and discount weightsummary(fit)print(fit)