bootstrapValidation function is generic (see
);
method functions can be written to handle specific classes of
data. Classes which already have methods for this function include:
formula
bootstrapValidation(x, <<y or data>>,
modelFit, B, group = NULL, subject = NULL,
args.modelFit = NULL,
predFun = <<see below>>, args.predFun = NULL,
passOldData.predFun = F,
errFun = <<see below>>, args.errFun = NULL,
seed = .Random.seed,
label,
trace = resampleOptions()$trace, assign.frame1 = F,
save.indices = F,
save.group = <<see below>>, save.subject = <<see below>>,
save.errors = F)
bootstrapValidation.default(x, y,
<<modelFit and subsequent arguments>>)
bootstrapValidation.formula(x, data,
<<modelFit and subsequent arguments>>)
bootstrapValidation.default, a data frame or matrix containing
the explanatory variables.
For
bootstrapValidation.formula, a formula object that specifies the model, with
the response on the left of a
~ operator and the explanatory terms,
separated by
+ operators, on the right.
For
bootstrapValidation.formula:
the function must accept a formula as its first argument,
and have a
data argument;
e.g.
modelFit(x, data=data).
For
bootstrapValidation.default: this function must take arguments
x and
y,
not necessarily in that order.
data, for
stratified sampling or multiple-sample problems.
Sampling is done separately for each group
(determined by unique values of this vector).
If
data is a data frame, this may be a variable in the data frame,
or expression involving such variables.
data;
if present then subjects
(determined by unique values of this vector) are resampled rather than
individual observations.
If
data is a data frame, this may be a variable in the data frame,
or an expression involving such variables.
If
group is also present,
subject must be nested within
group
(each subject must be in only one group).
modelFit when fitting the model.
predict.
predFun
when calculating predicted values.
errFun
when calculating the prediction error.
set.seed.
assign.frame1=T if all estimates are identical (this is slower).
indices below.
TRUE then
group and
subject vectors, respectively,
are saved in the returned object. Both defaults are
TRUE if
n<=10000.
TRUE then the matrix of errors are
saved in the returned object.
bootstrapValidation, with the following components:
bootstrapValidation, but with all the arguments explicitly named.
.Random.seed.
.Random.seed.
bootstrapValidation.
group vector.
subject vector.
n rows and
B columns, indicating which
observations were assigned to each bootstrap sample.
n rows and
B columns, containing the
errors (as measured by
errFun) for each observation and bootstrap sample.
modelFit. If
passOldData.predFun=T, the prediction
algorithm assigns the training data to frame 1 using the name
oldData.
(Note that
passOldData.predFun gets set to
T automatically when
gam
is used.) If
assign.frame1=T, the data is assigned to frame 1 using
the name of the data frame or the name
data. You must be sure that
these assignments to frame 1 do not overwrite some quantity of interest
stored in frame 1.
Performs bootstrap estimates of prediction error for a wide scope of models.
The algorithm samples by selecting certain rows of a data frame,
so this function is not generally applicable to grouped-data problems that
use modeling functions like
lme and
nlme, unless you use the
subject
variable.
Normally the first two arguments to
predFun are the model object
and new data. Most methods for
predict (the default
predFun)
satisfy this.
However,
predict.censorReg currently has
first four arguments
object, p, q, newdata.
To use this, you could either write your
own
predFun which calls
predict.censorReg with arguments in a different
order, or supply
args.predFun = list(p=c(.1,.5,.9),q=NULL);
this results in internal calls of the form
predict(model object, new data, p=c(.1,.5,.9), q=NULL).
Because named arguments (
p and
q) take precedence, the new
data will end up being used as the fourth argument to
predict.censorReg, as desired.
Similarly, the first two arguments to
errFun are normally the
actual and fitted values of the response variable, but these may be
displaced to later positions by named arguments in
args.errFun.
The combination of
predFun and
errFun, and their arguments,
should be appropriate for your model.
For example, in a logistic regression (
glm with
family=binomial),
args.predFun=list(type="response") puts predictions on the
probability scale, and
errFun could compute a weighted sum of squares.
The defaults are appropriate for the usual linear least-squares regression.
Efron, B. and Tibshirani, R.J. (1995), "Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule," Technical Report (see http://www-stat.stanford.edu/~tibs/research.html)
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
For an annotated list of functions in the package, including other high-level resampling functions, see: .
bootstrapValidation(ozone ~ ., air, lm, B = 40)
bootstrapValidation(skips ~ ., data = solder.balance, glm,
B = 30, args.modelFit = list(family = poisson))
# stratified sampling
bootstrapValidation(skips ~ ., data = solder.balance, glm,
B = 30, group = Solder, args.modelFit = list(family = poisson))
# bootstrapValidation.default method
bootstrapValidation(air$wind, air$ozone, smooth.spline, B=30, predFun =
function(object, newdata) predict(object, x = newdata)$y)
# model selection with smooth.spline
attach(air)
plot(ozone,temperature)
tempErr <- rep(NA, 11)
for(i in 1:11){
cat("model", i, "\n")
res <- bootstrapValidation(ozone, temperature, smooth.spline,
args.modelFit = list(df = i+1), predFun =
function(object, newdata){predict(object, x = newdata)$y},
B = 30)
tempErr[i] <- res$err632plus
}
argminErr <- which(tempErr == min(tempErr))[1] + 1
lines(smooth.spline(ozone,temperature, df = argminErr))
# note: this simple example ignores the variability
# in the bootstrapValidation estimates, and just picks the
# minimum error as the "winner"
# local regression model
bootstrapValidation(NOx ~ C * E, data = ethanol, loess, B = 30,
args.modelFit = list(span = 1/2, degree = 2,
parametrix = "C", drop.square = "C",
control = loess.control("direct")))
# Test if match:
# 1. supply the prediction function
bootp1 <- bootstrapValidation(ozone ~ ., air, lm, B = 40, predFun =
function(object, newdata, se.fit) predict.lm(object,
newdata, se.fit = T)$fit)
# 2. supply the error function and args.errFun
# while still doing the same model
bootp2 <- bootstrapValidation(ozone ~ ., air, lm, B = 40, errFun =
function(y, fitted, dim) ((y - fitted)^dim),
args.errFun = list(dim = 2), seed = bootp1$seed.start)
all.equal(bootp1[-1], bootp2[-1])
# match except for calls