xval
-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers'
Dxy
rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score.
This function is a modification of
cv.tree
which should be
consulted for details. There are
print
and
plot
methods for
objects created by
validate.tree
.
# f <- tree(formula=y ~ x1 + x2 + ...) # or rpart ## S3 method for class 'tree': validate(fit, method, B, bw, rule, type, sls, aics, pr=TRUE, k, rand, xval=10, FUN, ...) ## S3 method for class 'rpart': validate(fit, ...) ## S3 method for class 'validate.tree': print(x, ...) ## S3 method for class 'validate.tree': plot(x, what=c("mse","dxy"), legendloc=locator, ...)
tree
or
rpart
or having the same
attributes as one created by
tree
. If it was created by
rpart
you must have specified the
model=TRUE
argument to
rpart
.
validate
function; these are ignored
validate.tree
FUN
with no optional arguments (if
tree
) or
from the
rpart
cptable
object in the original fit object.
You may also specify a scalar or vector.
cv.tree
prune.tree
or
shrink.tree
or
prune.rpart
. Default is
prune.tree
for fits from
tree
and
prune.rpart
for fits from
rpart
.
FUN
(ignored by
print,plot
). For
validate.rpart
, ... can be the same arguments used in
validate.tree
.
FALSE
to prevent intermediate results for each
k
to be printed
mse
and one for
Dxy
.
1
to
generate a list with components
x, y
specifying coordinates of the
upper left corner of a legend, or a 2-vector. For the latter,
legendloc
specifies the relative fraction of the plot at which to
center the legend.
"validate.tree"
with components named
k, size, dxy.app
,
dxy.val, mse.app, mse.val, binary, xval
.
size
is the number of nodes,
dxy
refers to Somers'
D
,
mse
refers to mean squared error of prediction,
app
means apparent accuracy on training samples,
val
means validated
accuracy on test samples,
binary
is a logical variable indicating whether
or not the response variable was binary (a logical or 0/1 variable is
binary).
size
will not be present if the user specifies
k
.
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
## Not run: n <- 100 set.seed(1) x1 <- runif(n) x2 <- runif(n) x3 <- runif(n) y <- 1*(x1+x2+rnorm(n) > 1) table(y) library(rpart) f <- rpart(y ~ x1 + x2 + x3, model=TRUE) v <- validate(f) v # note the poor validation par(mfrow=c(1,2)) plot(v, legendloc=c(.2,.5)) par(mfrow=c(1,1)) ## End(Not run)