validate function

Resampling Validation of a Fitted Model's Indexes of Fit

Resampling Validation of a Fitted Model's Indexes of Fit

The validate function when used on an object created by one of the rms series does resampling validation of a regression model, with or without backward step-down variable deletion. The print method will call the latex or html method if options(prType=) is set to "latex" or "html". For "latex" printing through print(), the LaTeX table environment is turned off. When using html with Quarto or RMarkdown, results='asis' need not be written in the chunk header.

# fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE) validate(fit, method="boot", B=40, bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, force=NULL, estimates=TRUE, pr=FALSE, ...) ## S3 method for class 'validate' print(x, digits=4, B=Inf, ...) ## S3 method for class 'validate' latex(object, digits=4, B=Inf, file='', append=FALSE, title=first.word(deparse(substitute(x))), caption=NULL, table.env=FALSE, size='normalsize', extracolsize=size, ...) ## S3 method for class 'validate' html(object, digits=4, B=Inf, caption=NULL, ...)

Arguments

  • fit: a fit derived by e.g. lrm, cph, psm, ols. The options x=TRUE and y=TRUE

    must have been specified.

  • method: may be "crossvalidation", "boot" (the default), ".632", or "randomization". See predab.resample for details. Can abbreviate, e.g. "cross", "b", ".6".

  • B: number of repetitions. For method="crossvalidation", is the number of groups of omitted observations. For print.validate, latex.validate, and html.validate, B is an upper limit on the number of resamples for which information is printed about which variables were selected in each model re-fit. Specify zero to suppress printing. Default is to print all re-samples.

  • bw: TRUE to do fast step-down using the fastbw function, for both the overall model and for each repetition. fastbw

    keeps parameters together that represent the same factor.

  • rule: Applies if bw=TRUE. "aic" to use Akaike's information criterion as a stopping rule (i.e., a factor is deleted if the chisquarechi-square falls below twice its degrees of freedom), or "p" to use PP-values.

  • type: "residual" or "individual" - stopping rule is for individual factors or for the residual chisquarechi-square for all variables deleted

  • sls: significance level for a factor to be kept in a model, or for judging the residual chisquarechi-square.

  • aics: cutoff on AIC when rule="aic".

  • force: see fastbw

  • estimates: see print.fastbw

  • pr: TRUE to print results of each repetition

  • ...: parameters for each specific validate function, and parameters to pass to predab.resample (note especially the group, cluster, amd subset parameters). For latex, optional arguments to latex.default. Ignored for html.validate.

    For psm, you can pass the maxiter parameter here (passed to survreg.control, default is 15 iterations) as well as a tol parameter for judging matrix singularity in solvet (default is 1e-12) and a rel.tolerance parameter that is passed to survreg.control (default is 1e-5).

    For print.validate is ignored.

  • x,object: an object produced by one of the validate functions

  • digits: number of decimal places to print

  • file: file to write LaTeX output. Default is standard output.

  • append: set to TRUE to append LaTeX output to an existing file

  • title, caption, table.env, extracolsize: see latex.default. If table.env is FALSE and caption is given, the character string contained in caption will be placed before the table, centered.

  • size: size of LaTeX output. Default is 'normalsize'. Must be a defined LaTeX size when prepended by double slash.

Details

It provides bias-corrected indexes that are specific to each type of model. For validate.cph and validate.psm, see validate.lrm, which is similar.

For validate.cph and validate.psm, there is an extra argument dxy, which if TRUE causes the dxy.cens

function to be invoked to compute the Somers' DxyDxy rank correlation to be computed at each resample. The values corresponding to the row DxyDxy are equal to 2(C0.5)2 * (C - 0.5) where C is the C-index or concordance probability.

For validate.cph with dxy=TRUE, you must specify an argument u if the model is stratified, since survival curves can then cross and XbetaX beta is not 1-1 with predicted survival.

There is also validate method for tree, which only does cross-validation and which has a different list of arguments.

Returns

a matrix with rows corresponding to the statistical indexes and columns for columns for the original index, resample estimates, indexes applied to the whole or omitted sample using the model derived from the resample, average optimism, corrected index, and number of successful re-samples.

Side Effects

prints a summary, and optionally statistics for each re-fit

Author(s)

Frank Harrell

Department of Biostatistics, Vanderbilt University

fh@fharrell.com

See Also

validate.ols, validate.cph, validate.lrm, validate.rpart, predab.resample, fastbw, rms, rms.trans, calibrate, dxy.cens, concordancefit

Examples

# See examples for validate.cph, validate.lrm, validate.ols # Example of validating a parametric survival model: require(survival) n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, TRUE)) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) dt <- -log(runif(n))/h e <- ifelse(dt <= cens,1,0) dt <- pmin(dt, cens) units(dt) <- "Year" S <- Surv(dt,e) f <- psm(S ~ age*sex, x=TRUE, y=TRUE) # Weibull model # Validate full model fit validate(f, B=10) # usually B=150 # Validate stepwise model with typical (not so good) stopping rule # bw=TRUE does not preserve hierarchy of terms at present validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")
  • Maintainer: Frank E Harrell Jr
  • License: GPL (>= 2)
  • Last published: 2025-01-17

Downloads (last 30 days):