bootstrap function is generic (see Methods); method
functions can be written to handle specific classes of
data. Classes which already have methods for this function include:
bootstrap(data, statistic, B = 1000, args.stat,
group, subject,
sampler = samp.bootstrap, seed = .Random.seed,
sampler.prob,
sampler.args, sampler.args.group,
resampleColumns,
label, statisticNames,
block.size = min(100,B),
trace = resampleOptions()$trace, assign.frame1 = F,
save.indices, save.group, save.subject,
statistic.is.random,
group.order.matters = T,
order.matters,
seed.statistic = 500,
L, model.mat, argumentList,
observed.indices = 1:n, ...)
See
for further details of arguments marked with "*" (including
important capabilities not described here), and
for a description of arguments not described below.
args.stat.
mean(x,trim=.2).
If
data is given by name (e.g.
data=x) then use that name
in the expression,
otherwise (e.g.
data=air[,4]) use the name
data in the expression.
If
data is a data frame, the expression may involve variables
in the data frame.
statistic is a function, a
list of other arguments, if any, to pass to
statistic when calculating
the statistic on the resamples, e.g.
list(trim=.2).
If
statistic is an expression, then a list of objects to
include in the frame where the expression is evaluated.
data, for
stratified sampling or multiple-sample problems.
Sampling is done separately for each group
(determined by unique values of this vector).
If
data is a data frame, this may be a variable in the data frame,
or expression involving such variables.
data;
if present then subjects
(determined by unique values of this vector) are resampled rather than
individual observations.
If
data is a data frame, this may be a variable in the data frame,
or an expression involving such variables.
If
group is also present,
subject must be nested within
group
(each subject must be in only one group).
bootstrap makes resampled subjects
unique before calling the statistic.
samp.bootstrap(size = 100) for setting optional arguments to
the sampler. See also argument
sampler.args, described in
.
2 indicating to return compressed indices; by default
choose based on the sample size and
B.
data,
statistic,
group, and
subject may be specified in this list, and
their values override the values set by their regular placement in the
argument list. See
for examples.
bootstrap.
bootstrap which inherits from
resamp. This has
components
call,
observed,
replicates,
estimate,
B,
n (the
number of observations or subjects),
dim.obs,
seed.start, and
seed.end.
Components which may be present include
B.missing,
weights (see
sampler.prob),
group,
subject,
label,
defaultLabel,
parent.frame (the frame of the caller of
bootstrap),
indices,
compressedIndices,
L,
Lstar, and others.
The data frame
estimate has three
columns containing the bootstrap estimates of
Bias,
Mean, and
SE. See
or
for further details.
If the function is interrupted it saves current results
(all complete sets of
block.size replicates)
to
.bootstrap.partial.results. This object is nearly the same as if
bootstrap were called with a smaller value of
B, so many functions
that expect an object of class
bootstrap will operate correctly.
An exception is
;
see the help file for a work-around.
The function
bootstrap
causes creation of the dataset
.Random.seed
if it does not already exist, otherwise its value is updated.
See other help files and for details.
Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
A number of technical reports on aspects of the resampling code
are found at
www.insightful.com/Hesterberg/bootstrap
See .
Bootstrap and other objects: , , .
Print, summarize, plot: , , , ,
Description of a "bootstrap" object, extract parts: , , , , .
Diagnostics: , .
Confidence intervals: , , , , .
Modify a "bootstrap" object: , , , .
For an annotated list of functions in the package, including other high-level resampling functions, see: .
# Bootstrap a mean; demonstrate summary(), plot(), qqnorm()
bootstrap(stack.loss, mean)
temp <- bootstrap(stack.loss, mean)
temp
summary(temp)
plot(temp)
qqnorm(temp)
# Percentiles
limits.percentile(temp)
# Confidence intervals
limits.tilt(temp)
limits.bca(temp)
limits.bca(temp,detail=T)
# Here the "statistic" argument is an expression, not a function.
stack <- cbind(stack.loss, stack.x)
bootstrap(stack, l1fit(stack[,-1], stack[,1])$coef, seed=0)
# Again, but if the data is created on the fly, then
# use the name "data" in the statistic expression:
bootstrap(cbind(stack.loss, stack.x),
l1fit(data[,-1], data[,1])$coef, seed=0)
temp <- bootstrap(stack, var) # Here "statistic" is a function.
parallel(~ temp$replicates) # Interesting trellis plot.
# Demonstrate the args.stat argument
# without args.stat:
bootstrap(stack.loss, mean(stack.loss, trim=.2))
# statistic is a function:
bootstrap(stack.loss, mean, args.stat = list(trim=.2))
# statistic is an expression, object "h" defined in args.stat
bootstrap(stack.loss, mean(stack.loss, trim=h),
args.stat = list(h=.2))
# Bootstrap regression coefficients (in 3 equivalent ways).
fit.lm <- lm(Mileage ~ Weight, fuel.frame)
bootstrap(fuel.frame, coef(lm(Mileage ~ Weight, fuel.frame)), B = 250,
seed = 0)
bootstrap(fuel.frame, coef(eval(fit.lm$call)), B = 250, seed = 0)
bootstrap(fit.lm, coef, B = 250, seed = 0)
# Bootstrap a nonlinear least squares analysis
fit.nls <- nls(vel ~ (Vm * conc)/(K + conc), Puromycin,
start = list(Vm = 200, K = 0.1))
temp.nls <- bootstrap(Puromycin, coef(eval(fit.nls$call)))
pairs(temp.nls$rep)
plot(temp.nls$rep[,1], temp.nls$rep[,2])
contour(hist2d(temp.nls$rep[,1], temp.nls$rep[,2]))
image(hist2d(temp.nls$rep[,1], temp.nls$rep[,2]))
# Jackknife after bootstrap
jackknifeAfterBootstrap(temp.nls)
jackknifeAfterBootstrap(temp.nls, stdev)
# Bootstrap the calculation of a covariance matrix
my.x <- runif(2000)
my.dat <- cbind(x=my.x, y=my.x+0.5*rnorm(2000))
bootstrap(my.dat, var)
# Perform a jackknife analysis.
jackknife(stack.loss, mean)
## Two-sample problems
# Bootstrap the distribution of the difference of two group means
# (group sizes vary across bootstrap samples)
West <- (as.character(state.region) == "West")
Income <- state.x77[,"Income"]
bootstrap(data.frame(Income, West),
mean(data[ data[,"West"],"Income"]) -
mean(data[!data[,"West"],"Income"]))
# Stratified bootstrapping for difference of group means
# (resampling is done separately within "West" and "not West", so
# group sizes are constant across bootstrap samples)
bootstrap(Income, mean(Income[West])-mean(Income[!West]), group = West)
# Different sampling mechanisms
# Permutation distribution for the difference in two group means,
# under the hypothesis of one population.
# Note that either the group or response variable is permuted, not
# both.
bootObj <- bootstrap(Income, sampler = samp.permute,
mean(Income[West])-mean(Income[!West]))
1 - mean(bootObj$replicates < bootObj$observed) # one-sided p-value
# Balanced bootstrap
bootstrap(stack.loss, mean, sampler=samp.boot.bal)
# Bootstrapping unadjusted residuals in lm (2 equivalent ways)
fit.lm <- lm(Mileage~Weight, fuel.frame)
resids <- resid(fit.lm)
preds <- predict(fit.lm)
bootstrap(resids, lm(resids+preds~fuel.frame$Weight)$coef, B=250, seed=0)
bootstrap(fit.lm, coef, lmsampler="resid", B=250, seed=0)
# Bootstrapping other fitted models: gam
fit.gam <-gam(Kyphosis ~ s(Age,4) + Number, family = binomial,
data = kyphosis)
bootstrap(fit.gam, coef, B=100)
# Bootstrap when patients have varying number of cases:
# sampling by subject
DF <- data.frame(ID=rep(101:103, c(4,5,6)), x=1:15)
DF # Patient 101 has 4 cases, 102 has 5, 103 has 6.
bootstrap(DF, mean(x), subject=ID)
## Bootstrap bagging: a classification tree
# The first column of data set kyphosis is the
# response variable Kyphosis, with values "present" or "absent"
kyph.pred <- predict(tree(kyphosis, minsize = 5))
# The apparent misclassification rate
n <- numRows(kyphosis)
mean(kyph.pred[cbind(1:n, kyphosis$Kyphosis)] < .5) # 0.02469136
# bootstrap to get an averaged tree and predict on the original data
my.kyphosis <- kyphosis
kyph.pred.boot <- bootstrap(kyphosis, predict(tree(kyphosis,
minsize = 5), newdata = my.kyphosis), B = 100, seed = 10)
# The row names for the replicates are made using the row names of the
# original data and the abbreviated response values.
rows <- dimnames(kyphosis)[[1]]
kyph.names <- paste(rows, abbreviate(kyphosis$Kyphosis,5), sep = ".")
# The apparent misclassification rate for the averaged tree is
# higher, but more realistic as a measure of predictive error.
mean(kyph.pred.boot$estimate[kyph.names, "Mean"] < .5) # 0.03703704
## Run in background
For(1, temp <- bootstrap(stack.loss, mean, B=1000), wait=F)