bayesCox() R function from [dynsurv]

Fit Bayesian Cox Model for Interval Censored Survival Data

Fit Bayesian Cox model with time-independent, time-varying or dynamic covariate coefficient. The fit is done within a Gibbs sampling framework. The reversible jump algorithm is employed for the dynamic coefficient model. The baseline hazards are allowed to be either time-varying or dynamic.


bayesCox(
  formula,
  data,
  grid = NULL,
  out = NULL,
  model = c("TimeIndep", "TimeVarying", "Dynamic"),
  base.prior = list(),
  coef.prior = list(),
  gibbs = list(),
  control = list(),
  ...
)

Arguments

formula: A formula object, with the response on the left of a '~' operator, and the terms on the right. The response must be a survival object as returned by the function Surv with type = "interval2". help(Surv) for details.
data: A data.frame in which to interpret the variables named in the formula.
grid: Vector of pre-specified time grid points for model fitting. It will be automatically set up from data if it is left unspecified in the function call. By default, it consists of all the unique finite endpoints (rounded to two significant numbers) of the censoring intervals after time zero. The grid specified in the function call determines the location of possible jumps. It should be sorted increasingly and cover all the finite non-zero endpoints of the censoring intervals. Inappropriate grid specified will be taken care by the function internally.
out: An optional character string specifying the name of Markov chain Monte Carlo (MCMC) samples output file. By default, the MCMC samples will be output to a temporary directory set by tempdir and saved in the returned bayesCox object after burning and thinning. If out is specified, the MCMC samples will be preserved in the specified text file.
model: Model type to fit. Available options are "TimeIndep", "TimeVarying", and "Dynamic". Partial matching on the name is allowed.
base.prior: List of options for prior of baseline lambda. Use list(type = "Gamma", shape = 0.1, rate = 0.1) for all models; list(type = "Const", value = 1) for Dynamic model when intercept = TRUE.
coef.prior: List of options for prior of coefficient beta. Use list(type = "Normal", mean = 0, sd = 1) for TimeIndep

model; list(type = "AR1", sd = 1) for TimeVarying and Dynamic models; list(type = "HAR1", shape = 2, scale = 1)

for TimeVarying and Dynamic models.
gibbs: List of options for Gibbs sampler.
control: List of general control options.
...: Other arguments that are for futher extension.

Returns

An object of S3 class bayesCox representing the fit.

Details

To use default hyper parameters in the specification of either base.prior or coef.prior, one only has to supply the name of the prior, e.g., list(type = "Gamma"), list(type = "HAR1").

The gibbs argument is a list of components:

iter:: Number of iterations, default 3000;
burn:: Number of burning, default 500;
thin:: Number of thinning, default 1;
verbose:: A logical value, default TRUE. If TRUE, print the iteration;
nReport:: Print frequency, default 100.

The control argument is a list of components:

intercept:: A logical value, default FALSE. If TRUE, the model will estimate the intercept, which is the log of baseline hazards. If TRUE, please remember to turn off the direct estimation of baseline hazards, i.e., base.prior = list(type = "Const")
a0:: Multiplier for initial variance in time-varying or dynamic models, default 100;
eps0:: Size of auxiliary uniform latent variable in dynamic model, default 1.

For users interested in extracting MCMC sampling information from the output files, the detail of the output files is presented as follows: Let $k$ denote the number of time points (excluding time zero) specified in grid, $ck$ equal $1$ for model with time-invariant coefficients; $ck$ equal $k$ otherwise, and $p$ denote the number of covariates. Then the each sample saved in each row consists of the following possible parts.

Part 1:: The first $k$ numbers represent the jump size of baseline hazard function at each time grid. If we take the column mean of the first $k$ columns of the output file, we will get the same numbers with obj$est$lambda, where obj is the bayesCox
```
 object returned by the function.
```

Part 2:: The sequence from $(k + 1) to (k + ck * p)$

 represent the coefficients of covariates at the time grid. The first $k$ numbers in the sequence are the coefficients for the first covariate at the time grid; The second $k$ numbers' sub-sequence are the coefficients for the second covariate and so on.

Part 3:: The sequence from $(k + ck * p + 1)$ to $(k + ck * p + p)$ represents the sampled latent variance of coefficients.
Part 4:: The sequence from $(k + ck * p + p + 1)$ to $(k + 2 * ck * p + p)$ represents the indicator of whether there is a jump of the covariate coefficients at the time grid. Similar with Part 2, the first k numbers' sub-sequence is for the first covariate, the second $k$ numbers' sub-sequence is for the second covariate, and so on.

For the model with time-independent coefficients, the output file only has Part 1 and Part 2 in each row; For time-varying coefficient model, the output file has Part 1, 2, and 3; The output file for the dynamic model has all the four parts. Note that the dynamic baseline hazard will be taken as one covariate. So $p$ needs being replaced with $(p + 1)$ for model with dynamic baseline hazard rate. No function in the package actually needs the Part 1 from the output file now; The Part 2 is used by function coef and survCurve; The Part 3 is needed by function nu; Function jump extracts the Part 4.

Examples


## Not run:

library(dynsurv)
set.seed(1216)

## breast cancer data
data(bcos)
mydata <- bcos
myformula <- Surv(left, right, type = "interval2") ~ trt

### Fit time-independent coefficient model
fit0 <- bayesCox(myformula, mydata, model = "TimeIndep",
                 base.prior = list(type = "Gamma", shape = 0.1, rate = 0.1),
                 coef.prior = list(type = "Normal", mean = 0, sd = 1),
                 gibbs = list(iter = 100, burn = 20, thin = 1, verbose = FALSE))

## plot coefficient estimates
plotCoef(coef(fit0, level = 0.9))

## Plot the estimated survival function for given new data
newDat <- data.frame(trt = c("Rad", "RadChem"))
row.names(newDat) <- unique(newDat$trt)
plotSurv(survCurve(fit0, newDat), conf.int = TRUE)

### Fit time-varying coefficient model
fit1 <- bayesCox(myformula, mydata, model = "TimeVary",
                 base.prior = list(type = "Gamma", shape = 0.1, rate = 0.1),
                 coef.prior = list(type = "AR1", sd = 1),
                 gibbs = list(iter = 100, burn = 20, thin = 1,
                              verbose = TRUE, nReport = 20))

plotCoef(coef(fit1))
plotSurv(survCurve(fit1), conf.int = TRUE)

### Fit dynamic coefficient model with time-varying baseline hazards
fit2 <- bayesCox(myformula, mydata, model = "Dynamic",
                 base.prior = list(type = "Gamma", shape = 0.1, rate = 0.1),
                 coef.prior = list(type = "HAR1", shape = 2, scale = 1),
                 gibbs = list(iter = 100, burn = 20, thin = 1,
                              verbose = TRUE, nReport = 20))

plotCoef(coef(fit2))
plotJumpTrace(jump(fit2))
plotJumpHist(jump(fit2))
plotNu(nu(fit2))
plotSurv(survCurve(fit2), conf.int = TRUE)

## Plot the coefficient estimates from three models together
plotCoef(rbind(coef(fit0), coef(fit1), coef(fit2)))

### Fit dynamic coefficient model with dynamic hazards (in log scales)
fit3 <- bayesCox(myformula, mydata, model = "Dynamic",
                 base.prior = list(type = "Const"),
                 coef.prior = list(type = "HAR1", shape = 2, scale = 1),
                 gibbs = list(iter = 100, burn = 20, thin = 1,
                              verbose = TRUE, nReport = 20),
                 control = list(intercept = TRUE))

plotCoef(coef(fit3))
plotJumpTrace(jump(fit3))
plotJumpHist(jump(fit3))
plotNu(nu(fit3))
plotSurv(survCurve(fit3), conf.int = TRUE)

## Plot the estimated survival function and the difference
plotSurv(survCurve(fit3, newdata = newDat, type = "survival"),
         legendName = "Treatment", conf.int = TRUE)
plotSurv(survDiff(fit3, newdata = newDat, type = "survival"),
         legendName = "Treatment", conf.int = TRUE, smooth = TRUE)

## extract MCMC samples
mcmc_list <- bayesCoxMcmc(fit3, part = "coef")
posterior_coef <- mcmc_list$coef
## posterior probabilities of hazard ratio of RadChem (vs. Rad)
## greater than 1 at time 10
posterior_coef[covariate == "trtRadChem" & time == 10, mean(exp(coef) > 1)]
## End(Not run)

References

X. Wang, M.-H. Chen, and J. Yan (2013). Bayesian dynamic regression models for interval censored survival data with application to children dental health. Lifetime data analysis, 19(3), 297--316.

X. Wang, X. Sinha, J. Yan, and M.-H. Chen (2014). Bayesian inference of interval-censored survival data. In: D. Chen, J. Sun, and K. Peace, Interval-censored time-to-event data: Methods and applications, 167--195.

X. Wang, M.-H. Chen, and J. Yan (2011). Bayesian dynamic regression models for interval censored survival data. Technical Report 13, Department of Statistics, University of Connecticut.

D. Sinha, M.-H. Chen, and S.K. Ghosh (1999). Bayesian analysis and model selection for interval-censored survival data. Biometrics 55(2), 585--590.

bayesCox function