bootstrap
function is generic (see Methods); method
functions can be written to handle specific classes of
data. Classes which already have methods for this function include:
bootstrap(data, statistic, B = 1000, args.stat, group, subject, sampler = samp.bootstrap, seed = .Random.seed, sampler.prob, sampler.args, sampler.args.group, resampleColumns, label, statisticNames, block.size = min(100,B), trace = resampleOptions()$trace, assign.frame1 = F, save.indices, save.group, save.subject, statistic.is.random, group.order.matters = T, order.matters, seed.statistic = 500, L, model.mat, argumentList, observed.indices = 1:n, ...)See for further details of arguments marked with "*" (including important capabilities not described here), and for a description of arguments not described below.
args.stat
.
mean(x,trim=.2)
.
If
data
is given by name (e.g.
data=x
) then use that name
in the expression,
otherwise (e.g.
data=air[,4]
) use the name
data
in the expression.
If
data
is a data frame, the expression may involve variables
in the data frame.
statistic
is a function, a
list of other arguments, if any, to pass to
statistic
when calculating
the statistic on the resamples, e.g.
list(trim=.2)
.
If
statistic
is an expression, then a list of objects to
include in the frame where the expression is evaluated.
data
, for
stratified sampling or multiple-sample problems.
Sampling is done separately for each group
(determined by unique values of this vector).
If
data
is a data frame, this may be a variable in the data frame,
or expression involving such variables.
data
;
if present then subjects
(determined by unique values of this vector) are resampled rather than
individual observations.
If
data
is a data frame, this may be a variable in the data frame,
or an expression involving such variables.
If
group
is also present,
subject
must be nested within
group
(each subject must be in only one group).
bootstrap
makes resampled subjects
unique before calling the statistic.
samp.bootstrap(size = 100)
for setting optional arguments to
the sampler. See also argument
sampler.args
, described in
.
2
indicating to return compressed indices; by default
choose based on the sample size and
B
.
data
,
statistic
,
group
, and
subject
may be specified in this list, and
their values override the values set by their regular placement in the
argument list. See
for examples.
bootstrap
.
bootstrap
which inherits from
resamp
. This has
components
call
,
observed
,
replicates
,
estimate
,
B
,
n
(the
number of observations or subjects),
dim.obs
,
seed.start
, and
seed.end
.
Components which may be present include
B.missing
,
weights
(see
sampler.prob
),
group
,
subject
,
label
,
defaultLabel
,
parent.frame
(the frame of the caller of
bootstrap
),
indices
,
compressedIndices
,
L
,
Lstar
, and others.
The data frame
estimate
has three
columns containing the bootstrap estimates of
Bias
,
Mean
, and
SE
. See
or
for further details.
If the function is interrupted it saves current results
(all complete sets of
block.size
replicates)
to
.bootstrap.partial.results
. This object is nearly the same as if
bootstrap
were called with a smaller value of
B
, so many functions
that expect an object of class
bootstrap
will operate correctly.
An exception is
;
see the help file for a work-around.
The function
bootstrap
causes creation of the dataset
.Random.seed
if it does not already exist, otherwise its value is updated.
See other help files and for details.
Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
A number of technical reports on aspects of the resampling code
are found at
www.insightful.com/Hesterberg/bootstrap
See .
Bootstrap and other objects: , , .
Print, summarize, plot: , , , ,
Description of a "bootstrap" object, extract parts: , , , , .
Diagnostics: , .
Confidence intervals: , , , , .
Modify a "bootstrap" object: , , , .
For an annotated list of functions in the package, including other high-level resampling functions, see: .
# Bootstrap a mean; demonstrate summary(), plot(), qqnorm() bootstrap(stack.loss, mean) temp <- bootstrap(stack.loss, mean) temp summary(temp) plot(temp) qqnorm(temp) # Percentiles limits.percentile(temp) # Confidence intervals limits.tilt(temp) limits.bca(temp) limits.bca(temp,detail=T) # Here the "statistic" argument is an expression, not a function. stack <- cbind(stack.loss, stack.x) bootstrap(stack, l1fit(stack[,-1], stack[,1])$coef, seed=0) # Again, but if the data is created on the fly, then # use the name "data" in the statistic expression: bootstrap(cbind(stack.loss, stack.x), l1fit(data[,-1], data[,1])$coef, seed=0) temp <- bootstrap(stack, var) # Here "statistic" is a function. parallel(~ temp$replicates) # Interesting trellis plot. # Demonstrate the args.stat argument # without args.stat: bootstrap(stack.loss, mean(stack.loss, trim=.2)) # statistic is a function: bootstrap(stack.loss, mean, args.stat = list(trim=.2)) # statistic is an expression, object "h" defined in args.stat bootstrap(stack.loss, mean(stack.loss, trim=h), args.stat = list(h=.2)) # Bootstrap regression coefficients (in 3 equivalent ways). fit.lm <- lm(Mileage ~ Weight, fuel.frame) bootstrap(fuel.frame, coef(lm(Mileage ~ Weight, fuel.frame)), B = 250, seed = 0) bootstrap(fuel.frame, coef(eval(fit.lm$call)), B = 250, seed = 0) bootstrap(fit.lm, coef, B = 250, seed = 0) # Bootstrap a nonlinear least squares analysis fit.nls <- nls(vel ~ (Vm * conc)/(K + conc), Puromycin, start = list(Vm = 200, K = 0.1)) temp.nls <- bootstrap(Puromycin, coef(eval(fit.nls$call))) pairs(temp.nls$rep) plot(temp.nls$rep[,1], temp.nls$rep[,2]) contour(hist2d(temp.nls$rep[,1], temp.nls$rep[,2])) image(hist2d(temp.nls$rep[,1], temp.nls$rep[,2])) # Jackknife after bootstrap jackknifeAfterBootstrap(temp.nls) jackknifeAfterBootstrap(temp.nls, stdev) # Bootstrap the calculation of a covariance matrix my.x <- runif(2000) my.dat <- cbind(x=my.x, y=my.x+0.5*rnorm(2000)) bootstrap(my.dat, var) # Perform a jackknife analysis. jackknife(stack.loss, mean) ## Two-sample problems # Bootstrap the distribution of the difference of two group means # (group sizes vary across bootstrap samples) West <- (as.character(state.region) == "West") Income <- state.x77[,"Income"] bootstrap(data.frame(Income, West), mean(data[ data[,"West"],"Income"]) - mean(data[!data[,"West"],"Income"])) # Stratified bootstrapping for difference of group means # (resampling is done separately within "West" and "not West", so # group sizes are constant across bootstrap samples) bootstrap(Income, mean(Income[West])-mean(Income[!West]), group = West) # Different sampling mechanisms # Permutation distribution for the difference in two group means, # under the hypothesis of one population. # Note that either the group or response variable is permuted, not # both. bootObj <- bootstrap(Income, sampler = samp.permute, mean(Income[West])-mean(Income[!West])) 1 - mean(bootObj$replicates < bootObj$observed) # one-sided p-value # Balanced bootstrap bootstrap(stack.loss, mean, sampler=samp.boot.bal) # Bootstrapping unadjusted residuals in lm (2 equivalent ways) fit.lm <- lm(Mileage~Weight, fuel.frame) resids <- resid(fit.lm) preds <- predict(fit.lm) bootstrap(resids, lm(resids+preds~fuel.frame$Weight)$coef, B=250, seed=0) bootstrap(fit.lm, coef, lmsampler="resid", B=250, seed=0) # Bootstrapping other fitted models: gam fit.gam <-gam(Kyphosis ~ s(Age,4) + Number, family = binomial, data = kyphosis) bootstrap(fit.gam, coef, B=100) # Bootstrap when patients have varying number of cases: # sampling by subject DF <- data.frame(ID=rep(101:103, c(4,5,6)), x=1:15) DF # Patient 101 has 4 cases, 102 has 5, 103 has 6. bootstrap(DF, mean(x), subject=ID) ## Bootstrap bagging: a classification tree # The first column of data set kyphosis is the # response variable Kyphosis, with values "present" or "absent" kyph.pred <- predict(tree(kyphosis, minsize = 5)) # The apparent misclassification rate n <- numRows(kyphosis) mean(kyph.pred[cbind(1:n, kyphosis$Kyphosis)] < .5) # 0.02469136 # bootstrap to get an averaged tree and predict on the original data my.kyphosis <- kyphosis kyph.pred.boot <- bootstrap(kyphosis, predict(tree(kyphosis, minsize = 5), newdata = my.kyphosis), B = 100, seed = 10) # The row names for the replicates are made using the row names of the # original data and the abbreviated response values. rows <- dimnames(kyphosis)[[1]] kyph.names <- paste(rows, abbreviate(kyphosis$Kyphosis,5), sep = ".") # The apparent misclassification rate for the averaged tree is # higher, but more realistic as a measure of predictive error. mean(kyph.pred.boot$estimate[kyph.names, "Mean"] < .5) # 0.03703704 ## Run in background For(1, temp <- bootstrap(stack.loss, mean, B=1000), wait=F)