xval-fold cross-validation of a sequence of trees to derive
estimates of the mean squared error and Somers'
Dxy rank correlation
between predicted and observed responses. In the case of a binary response
variable, the mean squared error is the Brier accuracy score.
This function is a modification of
cv.tree which should be
consulted for details. There are
print and
plot methods for
objects created by
validate.tree.
# f <- tree(formula=y ~ x1 + x2 + ...) # or rpart
## S3 method for class 'tree':
validate(fit, method, B, bw, rule, type, sls, aics, pr=TRUE,
k, rand, xval=10, FUN, ...)
## S3 method for class 'rpart':
validate(fit, ...)
## S3 method for class 'validate.tree':
print(x, ...)
## S3 method for class 'validate.tree':
plot(x, what=c("mse","dxy"), legendloc=locator, ...)
tree or
rpart or having the same
attributes as one created by
tree. If it was created by
rpart you must have specified the
model=TRUE argument to
rpart.
validate function; these are ignored
validate.tree
FUN with no optional arguments (if
tree) or
from the
rpart
cptable object in the original fit object.
You may also specify a scalar or vector.
cv.tree
prune.tree or
shrink.tree or
prune.rpart. Default is
prune.tree for fits from
tree and
prune.rpart for fits from
rpart.
FUN (ignored by
print,plot). For
validate.rpart, ... can be the same arguments used in
validate.tree.
FALSE to prevent intermediate results for each
k to be printed
mse and one for
Dxy.
1 to
generate a list with components
x, y specifying coordinates of the
upper left corner of a legend, or a 2-vector. For the latter,
legendloc specifies the relative fraction of the plot at which to
center the legend.
"validate.tree" with components named
k, size, dxy.app,
dxy.val, mse.app, mse.val, binary, xval
.
size is the number of nodes,
dxy
refers to Somers'
D,
mse refers to mean squared error of prediction,
app
means apparent accuracy on training samples,
val means validated
accuracy on test samples,
binary is a logical variable indicating whether
or not the response variable was binary (a logical or 0/1 variable is
binary).
size will not be present if the user specifies
k.
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
## Not run: n <- 100 set.seed(1) x1 <- runif(n) x2 <- runif(n) x3 <- runif(n) y <- 1*(x1+x2+rnorm(n) > 1) table(y) library(rpart) f <- rpart(y ~ x1 + x2 + x3, model=TRUE) v <- validate(f) v # note the poor validation par(mfrow=c(1,2)) plot(v, legendloc=c(.2,.5)) par(mfrow=c(1,1)) ## End(Not run)