validate function when used on an object created by one of the
Design
series does resampling validation of a
regression model, with or without backward step-down variable deletion.
It provides bias-corrected indexes that are specific to each type
of model. For
validate.cph and
validate.psm, see
validate.lrm,
which is similar. For
validate.cph and
validate.psm, there is
an extra argument
dxy, which if
TRUE causes the
rcorr.cens
function to be invoked to compute the Somers' D_{xy} rank correlation
to be computed at each resample (this takes a bit longer than
the likelihood based statistics). For
validate.cph with
dxy=TRUE,
you must specify an argument
u if the model is stratified, since
survival curves can then cross and X beta is not 1-1 with
predicted survival. There is also
validate method for
tree
, which only does cross-validation and which has a different
list of arguments.
# fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE)
validate(fit, method="boot", B=40,
bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0,
pr=FALSE, ...)
lrm,
cph,
psm,
ols. The options
x=TRUE and
y=TRUE
must have been specified.
"crossvalidation",
"boot" (the default),
".632", or
"randomization".
See
predab.resample for details. Can abbreviate, e.g.
"cross", "b", ".6".
method="crossvalidation", is the
number of groups of omitted observations.
TRUE to do fast step-down using the
fastbw function, for both the overall model and for each repetition.
fastbw keeps parameters
together that represent the same factor.
bw=TRUE.
"aic" to use Akaike's information criterion as a
stopping rule (i.e., a factor is deleted if the chi-square falls below
twice its degrees of freedom), or
"p" to use P-values.
"residual" or
"individual" - stopping rule is for individual factors or
for the residual chi-square for all variables deleted
rule="aic".
TRUE to print results of each repetition
predab.resample (note especially the
group,
cluster, amd
subset parameters).
For
psm, you can pass the
maxiter parameter here (passed to
survreg.control, default is 15 iterations) as well as a
tol parameter
for judging matrix singularity in
solvet (default is 1e-12) and a
rel.tolerance parameter that is passed
to
survreg.control (default is 1e-5).
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
# See examples for validate.cph, validate.lrm, validate.ols
# Example of validating a parametric survival model:
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n, TRUE))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
S <- Surv(dt,e)
f <- psm(S ~ age*sex, x=TRUE, y=TRUE) # Weibull model
# Validate full model fit
validate(f, B=10) # usually B=150
# Validate stepwise model with typical (not so good) stopping rule
# bw=TRUE does not preserve hierarchy of terms at present
validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")