validate
function when used on an object created by one of the
Design
series does resampling validation of a
regression model, with or without backward step-down variable deletion.
It provides bias-corrected indexes that are specific to each type
of model. For
validate.cph
and
validate.psm
, see
validate.lrm
,
which is similar. For
validate.cph
and
validate.psm
, there is
an extra argument
dxy
, which if
TRUE
causes the
rcorr.cens
function to be invoked to compute the Somers' D_{xy} rank correlation
to be computed at each resample (this takes a bit longer than
the likelihood based statistics). For
validate.cph
with
dxy=TRUE
,
you must specify an argument
u
if the model is stratified, since
survival curves can then cross and X beta is not 1-1 with
predicted survival. There is also
validate
method for
tree
, which only does cross-validation and which has a different
list of arguments.
# fit <- fitting.function(formula=response ~ terms, x=TRUE, y=TRUE) validate(fit, method="boot", B=40, bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, pr=FALSE, ...)
lrm
,
cph
,
psm
,
ols
. The options
x=TRUE
and
y=TRUE
must have been specified.
"crossvalidation"
,
"boot"
(the default),
".632"
, or
"randomization"
.
See
predab.resample
for details. Can abbreviate, e.g.
"cross", "b", ".6"
.
method="crossvalidation"
, is the
number of groups of omitted observations.
TRUE
to do fast step-down using the
fastbw
function, for both the overall model and for each repetition.
fastbw
keeps parameters
together that represent the same factor.
bw=TRUE
.
"aic"
to use Akaike's information criterion as a
stopping rule (i.e., a factor is deleted if the chi-square falls below
twice its degrees of freedom), or
"p"
to use P-values.
"residual"
or
"individual"
- stopping rule is for individual factors or
for the residual chi-square for all variables deleted
rule="aic"
.
TRUE
to print results of each repetition
predab.resample
(note especially the
group
,
cluster
, amd
subset
parameters).
For
psm
, you can pass the
maxiter
parameter here (passed to
survreg.control
, default is 15 iterations) as well as a
tol
parameter
for judging matrix singularity in
solvet
(default is 1e-12) and a
rel.tolerance
parameter that is passed
to
survreg.control
(default is 1e-5).
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
# See examples for validate.cph, validate.lrm, validate.ols # Example of validating a parametric survival model: n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, TRUE)) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) dt <- -log(runif(n))/h e <- ifelse(dt <= cens,1,0) dt <- pmin(dt, cens) units(dt) <- "Year" S <- Surv(dt,e) f <- psm(S ~ age*sex, x=TRUE, y=TRUE) # Weibull model # Validate full model fit validate(f, B=10) # usually B=150 # Validate stepwise model with typical (not so good) stopping rule # bw=TRUE does not preserve hierarchy of terms at present validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual")