bootstrapValidation
function is generic (see
);
method functions can be written to handle specific classes of
data. Classes which already have methods for this function include:
formula
bootstrapValidation(x, <<y or data>>, modelFit, B, group = NULL, subject = NULL, args.modelFit = NULL, predFun = <<see below>>, args.predFun = NULL, passOldData.predFun = F, errFun = <<see below>>, args.errFun = NULL, seed = .Random.seed, label, trace = resampleOptions()$trace, assign.frame1 = F, save.indices = F, save.group = <<see below>>, save.subject = <<see below>>, save.errors = F) bootstrapValidation.default(x, y, <<modelFit and subsequent arguments>>) bootstrapValidation.formula(x, data, <<modelFit and subsequent arguments>>)
bootstrapValidation.default
, a data frame or matrix containing
the explanatory variables.
For
bootstrapValidation.formula
, a formula object that specifies the model, with
the response on the left of a
~
operator and the explanatory terms,
separated by
+
operators, on the right.
For
bootstrapValidation.formula
:
the function must accept a formula as its first argument,
and have a
data
argument;
e.g.
modelFit(x, data=data)
.
For
bootstrapValidation.default
: this function must take arguments
x
and
y
,
not necessarily in that order.
data
, for
stratified sampling or multiple-sample problems.
Sampling is done separately for each group
(determined by unique values of this vector).
If
data
is a data frame, this may be a variable in the data frame,
or expression involving such variables.
data
;
if present then subjects
(determined by unique values of this vector) are resampled rather than
individual observations.
If
data
is a data frame, this may be a variable in the data frame,
or an expression involving such variables.
If
group
is also present,
subject
must be nested within
group
(each subject must be in only one group).
modelFit
when fitting the model.
predict
.
predFun
when calculating predicted values.
errFun
when calculating the prediction error.
set.seed
.
assign.frame1=T
if all estimates are identical (this is slower).
indices
below.
TRUE
then
group
and
subject
vectors, respectively,
are saved in the returned object. Both defaults are
TRUE
if
n<=10000
.
TRUE
then the matrix of errors are
saved in the returned object.
bootstrapValidation
, with the following components:
bootstrapValidation
, but with all the arguments explicitly named.
.Random.seed
.
.Random.seed
.
bootstrapValidation
.
group
vector.
subject
vector.
n
rows and
B
columns, indicating which
observations were assigned to each bootstrap sample.
n
rows and
B
columns, containing the
errors (as measured by
errFun
) for each observation and bootstrap sample.
modelFit
. If
passOldData.predFun=T
, the prediction
algorithm assigns the training data to frame 1 using the name
oldData
.
(Note that
passOldData.predFun
gets set to
T
automatically when
gam
is used.) If
assign.frame1=T
, the data is assigned to frame 1 using
the name of the data frame or the name
data
. You must be sure that
these assignments to frame 1 do not overwrite some quantity of interest
stored in frame 1.
Performs bootstrap estimates of prediction error for a wide scope of models.
The algorithm samples by selecting certain rows of a data frame,
so this function is not generally applicable to grouped-data problems that
use modeling functions like
lme
and
nlme
, unless you use the
subject
variable.
Normally the first two arguments to
predFun
are the model object
and new data. Most methods for
predict
(the default
predFun
)
satisfy this.
However,
predict.censorReg
currently has
first four arguments
object, p, q, newdata
.
To use this, you could either write your
own
predFun
which calls
predict.censorReg
with arguments in a different
order, or supply
args.predFun = list(p=c(.1,.5,.9),q=NULL)
;
this results in internal calls of the form
predict(model object, new data, p=c(.1,.5,.9), q=NULL)
.
Because named arguments (
p
and
q
) take precedence, the new
data will end up being used as the fourth argument to
predict.censorReg
, as desired.
Similarly, the first two arguments to
errFun
are normally the
actual and fitted values of the response variable, but these may be
displaced to later positions by named arguments in
args.errFun
.
The combination of
predFun
and
errFun
, and their arguments,
should be appropriate for your model.
For example, in a logistic regression (
glm
with
family=binomial
),
args.predFun=list(type="response")
puts predictions on the
probability scale, and
errFun
could compute a weighted sum of squares.
The defaults are appropriate for the usual linear least-squares regression.
Efron, B. and Tibshirani, R.J. (1995), "Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule," Technical Report (see http://www-stat.stanford.edu/~tibs/research.html)
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
For an annotated list of functions in the package, including other high-level resampling functions, see: .
bootstrapValidation(ozone ~ ., air, lm, B = 40) bootstrapValidation(skips ~ ., data = solder.balance, glm, B = 30, args.modelFit = list(family = poisson)) # stratified sampling bootstrapValidation(skips ~ ., data = solder.balance, glm, B = 30, group = Solder, args.modelFit = list(family = poisson)) # bootstrapValidation.default method bootstrapValidation(air$wind, air$ozone, smooth.spline, B=30, predFun = function(object, newdata) predict(object, x = newdata)$y) # model selection with smooth.spline attach(air) plot(ozone,temperature) tempErr <- rep(NA, 11) for(i in 1:11){ cat("model", i, "\n") res <- bootstrapValidation(ozone, temperature, smooth.spline, args.modelFit = list(df = i+1), predFun = function(object, newdata){predict(object, x = newdata)$y}, B = 30) tempErr[i] <- res$err632plus } argminErr <- which(tempErr == min(tempErr))[1] + 1 lines(smooth.spline(ozone,temperature, df = argminErr)) # note: this simple example ignores the variability # in the bootstrapValidation estimates, and just picks the # minimum error as the "winner" # local regression model bootstrapValidation(NOx ~ C * E, data = ethanol, loess, B = 30, args.modelFit = list(span = 1/2, degree = 2, parametrix = "C", drop.square = "C", control = loess.control("direct"))) # Test if match: # 1. supply the prediction function bootp1 <- bootstrapValidation(ozone ~ ., air, lm, B = 40, predFun = function(object, newdata, se.fit) predict.lm(object, newdata, se.fit = T)$fit) # 2. supply the error function and args.errFun # while still doing the same model bootp2 <- bootstrapValidation(ozone ~ ., air, lm, B = 40, errFun = function(y, fitted, dim) ((y - fitted)^dim), args.errFun = list(dim = 2), seed = bootp1$seed.start) all.equal(bootp1[-1], bootp2[-1]) # match except for calls