Survival CART with time to event response via binary partitioning
Survival CART with time to event response via binary partitioning
Recursive partitioning for linear mixed effects model with survival data per SurvCART algorithm based on baseline partitioning variables (Kundu, 2020).
data: name of the dataset. It must contain variable specified for patid (indicating subject id), all the variables specified in the formula and the baseline partitioning variables.
patid: name of the subject id variable.
timevar: name of the variable with follow-up times.
censorvar: name of the variable with censoring status.
gvars: list of partitioning variables of interest. Value of these variables should not change over time. Regarding categorical variables, only numerically coded categorical variables should be specified. For nominal categorical variables or factors, please first create corresponding dummy variable(s) and then pass through gvars.
tgvars: types (categorical or continuous) of partitioning variables specified in gvar. For each of continuous partitioning variables, specify 1 and for each of the categorical partitioning variables, specify 0. Length of tgvars should match to the length of gvars
time.dist: name of time-to-event distribution. It can be one of the following distributions: "exponential", "weibull", "lognormal" or "normal".
cens.dist: name of censoring distribution. It can be one of the following distributions: "exponential", "weibull", "lognormal", "normal" or "NA". If specified "NA", then parameter instability test corresponding to censoring distribution will not be performed.
event.ind: value of the censoring variable indicating event.
alpha: alpha (i.e., nominal type I error) level for parameter instability test
minsplit: the minimum number of observations that must exist in a node in order for a split to be attempted.
minbucket: the minimum number of observations in any terminal node.
quantile: The quantile to be displayed in the visualization of tree through plot.SurvCART() or plot().
print: if TRUE, then summary such as number of subjects at risk, number of events, median event time and median censoring time model will be printed for each node.
Details
Construct survival tree based on heterogeneity in time-to-event and censoring distributions.
Normal distribution: f(t)=(1/sqrt(2pisigma^2))exp[-(1/2)(t-mu)/sigma^2]
Returns
Treeout: contains summary information of tree fitting for each terminal nodes and non-terminal nodes. Columns of Treeout include "ID", the (unique) node numbers that follow a binary ordering indexed by node depth, n, the number of subjectsreaching the node, D, the number of events reaching the node, median.T, the median survival time at the node, median.C, the median censoring time at the node, var, splitting variable, index, the cut-off value of splitting variable for binary partitioning, p (Instability), the p-value for parameter instability test for the splitting variable, loglik, the log-likelihood at the node, AIC, the AIC at the node, improve, the improvement in deviance given by this split, and Terminal, indicator (True or False) of terminal node.
logLik.tree: log-likelihood of the tree-structured model, based on Cox model including sub-groups as covariates
logLik.root: log-likelihood at the root node (i.e., without tree structure), based on Cox model without any covariate
AIC.tree: AIC of the tree-structured model, based on Cox model including sub-groups as covariates
AIC.root: AIC at the root node (i.e., without tree structure), based on Cox model without any covariate
nodelab: List of subgroups or terminal nodes with their description
varnam: List of splitting variables
ds: the dataset originally supplied
event.ind: value of the censoring variable indicating event.
timevar: name of the variable with follow-up times
censorvar: name of the variable with censoring status
Kundu, M. G., and Ghosh, S. (2021). Survival trees based on heterogeneity in time-to-event and censoring distributions using parameter instability test. Statistical Analysis and Data Mining: The ASA Data Science Journal, 14(5), 466-483.
See Also
plot, KMPlot, text, StabCat.surv, StabCont.surv
Examples
#--- Get the datadata(GBSG2)#numeric coding of character variablesGBSG2$horTh1<- as.numeric(GBSG2$horTh)GBSG2$tgrade1<- as.numeric(GBSG2$tgrade)GBSG2$menostat1<- as.numeric(GBSG2$menostat)#Add subject idGBSG2$subjid<-1:nrow(GBSG2)#--- Run SurvCART() with time-to-event distribution: exponential, censoring distribution: None out<- SurvCART(data=GBSG2, patid="subjid", censorvar="cens", timevar="time", gvars=c('horTh1','age','menostat1','tsize','tgrade1','pnodes','progrec','estrec'), tgvars=c(0,1,0,1,0,1,1,1), event.ind=1, alpha=0.05, minsplit=80, minbucket=40, print=TRUE)#--- Plot treepar(xpd =TRUE)plot(out, compress =TRUE)text(out, use.n =TRUE)#Plot KM plot for sub-groups identified by treeKMPlot(out, xscale=365.25, type=1)KMPlot(out, xscale=365.25, type=2, overlay=FALSE, mfrow=c(2,2), xlab="Year", ylab="Survival prob.")#--- Run SurvCART() with time-to-event distribution: weibull censoring distribution: None out2<- SurvCART(data=GBSG2, patid="subjid", censorvar="cens", timevar="time", gvars=c('horTh1','age','menostat1','tsize','tgrade1','pnodes','progrec','estrec'), tgvars=c(0,1,0,1,0,1,1,1), time.dist="weibull", event.ind=1, alpha=0.05, minsplit=80, minbucket=40, print=TRUE)#--- Run SurvCART() with time-to-event distribution: weibull censoring distribution: exponentialout<- SurvCART(data=GBSG2, patid="subjid", censorvar="cens", timevar="time", gvars=c('horTh1','age','menostat1','tsize','tgrade1','pnodes','progrec','estrec'), tgvars=c(0,1,0,1,0,1,1,1), time.dist="weibull", cens.dist="exponential", event.ind=1, alpha=0.05, minsplit=80, minbucket=40, print=TRUE)