bootstrap(data, statistic, B=1000, args.stat=NULL, group=NULL, sampler=samp.boot.mc, seed=.Random.seed, sampler.setup, sampler.wrapup, block.size=min(100,B), trace=T, assign.frame1=F, save.indices=F, statistic.is.random, seed.statistic=500)
args.stat
.
Or it may be an expression such as
mean(x, trim=.2)
.
If the
data
object has a name (e.g.
data=x
) then use that name
in the expression,
otherwise (e.g.
data=df$y
) use the name
data
in the expression,
e.g.
mean(data, trim=.2
).
/.be passed to the function through
args.stat
,
/.where
x
is the name of the object passed as the
data
argument.
/.If the data argument is constructed within the call to
bootstrap
,
/.then the data should be referred to as
data
in the expression.
See examples below.
statistic
when calculating
the statistic on the resamples.
samp.boot.mc
function
generates simple Monte Carlo resamples. The
samp.boot.bal
function performs
balanced bootstrapping. The user may write additional functions.
set.seed
.
bootstrap
uses an
lapply()
within a
for()
loop
(within two nested
for()
loops if
B
is a vector).
For small sample sizes, a
single
lapply()
is reasonable, while for large sample sizes, a series of
separate
lapply()
s is more efficient.
bootstrap
estimates are identical, try setting
assign.frame1=T
. Note that this
will slow down the algorithm.
bootstrap
which inherits from
resamp
. This has
components
call
,
observed
,
replicates
,
estimate
,
B
,
n
,
dim.obs
,
group
,
seed.start
, and
seed.end
. The data frame
estimate
has
three columns containing the bootstrap estimates of
Bias
,
Mean
, and
SE
.
assign.frame1=T
, the user must be sure that this assignment does not
overwrite some quantity of interest stored in frame 1.
If the function is interrupted it will save current results
(all complete sets of
block.size
replicates)
to
.bootstrap.partial.results
. This object is nearly the same as if
bootstrap
were called with a smaller value of
B
, so many functions
that expect an object of class
bootstrap
will operate correctly.
An exception is
update
; see the help file for
update.bootstrap
for a work-around.
The function
bootstrap
causes creation of the dataset
.Random.seed
if it does not already exist, otherwise its value is updated.
Performs nonparametric bootstrapping of observations for a wide scope of
statistics and expressions. Multisample bootstrapping is supported through
the
group
argument.
Balanced bootstrapping (
sampler=samp.boot.bal
)
gives balancing done separately within each group of resamples.
This is biased, of order O(1/
block.size
). It is
useful for estimating the bias of a statistic, but should be avoided
for estimating standard errors or confidence limits.
Davison, A.C. and Hinkley, D.V. (1997).
Bootstrap Methods and Their Application.
Cambridge University Press.
Efron, B. and Tibshirani, R. J. (1993).
An Introduction to the Bootstrap.
San Francisco: Chapman & Hall.
Shao, J. and Tu, D. (1995).
The Jackknife and Bootstrap.
New York: Springer-Verlag.
# Bootstrap a mean; demonstrate summary(), plot(), qqnorm() bootstrap(stack.loss, mean) temp <- bootstrap(stack.loss, mean) temp summary(temp) plot(temp) qqnorm(temp) # Confidence intervals limits.emp(temp) limits.bca(temp) limits.bca(temp,detail=T) # Here statistic argument is a call, not a function. stack <- cbind(stack.loss,stack.x) bootstrap(stack,l1fit(stack[,-1],stack[,1])$coef, seed=0) # Again, but construct the data in the call bootstrap(cbind(stack.loss,stack.x), l1fit(data[,-1],data[,1])$coef, seed=0) temp <- bootstrap(stack,var) # Here statistic argument is a function. parallel(~ temp$rep) # Interesting trellis plot. # Bootstrap regression coefficients (in 2 different ways). fit.lm <- lm(Mileage~Weight,fuel.frame) bootstrap(fuel.frame, coef(lm(Mileage~Weight,fuel.frame))) bootstrap(fuel.frame, coef(eval(fit.lm$call))) # Bootstrap a nonlinear least squares analysis fit.nls <- nls(vel ~ (Vm * conc)/(K + conc), Puromycin, start = list(Vm = 200, K = 0.1)) temp.nls <- bootstrap(Puromycin,coef(eval(fit.nls$call)), B=1000) pairs(temp.nls$rep) plot(temp.nls$rep[,1],temp.nls$rep[,2]) contour(hist2d(temp.nls$rep[,1],temp.nls$rep[,2])) image(hist2d(temp.nls$rep[,1],temp.nls$rep[,2])) # Jackknife after bootstrap jack.after.bootstrap(temp.nls) jack.after.bootstrap(temp.nls, stdev) # Bootstrap the calculation of a covariance matrix my.x <- runif(2000) my.dat <- cbind(x=my.x,y=my.x+0.5*rnorm(2000)) bootstrap(my.dat,var,B=1000) # Perform a jackknife analysis. jackknife(stack.loss,mean) ## Two-sample problems # Bootstrap the distribution of the difference of two group means # (group sizes will vary across bootstrap samples) West <- (as.character(state.region) == "West") Income <- state.x77[,"Income"] bootstrap(cbind(Income, West), mean(data[ data[,"West"],"Income"]) - mean(data[!data[,"West"],"Income"])) # Stratified bootstrapping for difference of group means bootstrap(Income, mean(Income[West])-mean(Income[!West]), group = West) ## Different sampling mechanisms # Permutation distribution for the difference in two group means, # under the hypothesis of one population. # Note that either the group or response variable is permuted, not both. bootObj <- bootstrap(Income, sampler = samp.permute, mean(Income[West])-mean(Income[!West])) 1 - mean(bootObj$replicates < bootObj$observed) # one-sided p-value # Balanced bootstrap bootstrap(stack.loss, mean, sampler=samp.boot.bal) # Bootstrapping unadjusted residuals in lm. fit.lm <- lm(Mileage~Weight, fuel.frame) resids <- resid(fit.lm) preds <- predict(fit.lm) bootstrap(resids, lm(resids+preds~fuel.frame$Weight)$coef) # Bootstrap when patients have varying number of cases. DF <- data.frame(ID=rep(101:103, c(4,5,6)), x=1:15) DF # Patient 101 has 4 cases, 102 has 5, 103 has 6. index.list <- split(1:nrow(DF), DF$ID) # The "data" argument to bootstrap is index.list; each element # of this list corresponds to one patient. # # If statistic is a function, it must take (a resampled version of) # index.list as its first argument, then extract the corresponding # rows of the "real" data DF: stat <- function(index.list) myRealFunction(DF[unlist(index.list),]) bootstrap(index.list, stat) # # If statistic is an expression, it should use index.list bootstrap(index.list, myRealFunction(DF[unlist(index.list),])) ## Run in background For(1, temp <- bootstrap(stack.loss, mean, B=1000), wait=F)