General Nonparametric Bootstrapping for Fitted Models

DESCRIPTION:

Performs bootstrap resampling of observations from fitted models of class lm and glm for specified statistics, and summarizes the bootstrap distribution.

USAGE:

bootstrap(data, statistic, ..., 
          lmsampler="observations") 
bootstrap.lm(data, statistic, ... 
          lmsampler="observations") 
bootstrap.glm(data, statistic, ...) 

REQUIRED ARGUMENTS:

data
an or object to be bootstrapped.
statistic
statistic to be bootstrapped; a function or expression that operates on lm or glm objects and returns a vector or matrix. It may be a function (e.g. coef) which accepts data as the first argument; other arguments may be passed using args.stat.
Or it may be an expression such as predict(fit,newdata=orig.frame). If the data object is given by name (e.g. data=fit) then use that name in the expression, otherwise (e.g. data=glm(formula,dataframe)) use the name data in the expression, e.g. predict(data,newdata=orig.frame).

OPTIONAL ARGUMENTS:

...
The remaining arguments have the same usage as in the default version of , except:
lmsampler
character string, one of "observations", "residuals", "wild" or "wild-as" (may be abbreviated). When bootstrapping observations, the data from the data argument to the call generating the lm object are resampled. When bootstrapping residuals, the (unadjusted) residuals and predicted values for the fit of the original data are computed. The residuals are then resampled and the statistic is evaluated on the fit with response variable replaced by the original predicted values plus the resampled residuals. The wild bootstraps are variations on resampling residuals; for the simple wild bootstrap, each residual is either added or subtracted to the predicted value for that observation. For the asymmetric wild bootstrap, the residual times Q is added to the prediction, where Q is a discrete random variable with mean 0, variance 1, and E(Q^3) = 1.

VALUE:

an object of class bootstrap which inherits from resamp. See help for .

When resampling residuals, the result has a component order.matters set to "resampling residuals"; this disables functions such as and that are only appropriate for the ordinary bootstrap.

DETAILS:

These functions are designed to speed up bootstrap computations when the statistic of interest requires fitting a model. Typically one has, for example,

#
bootstrap(data=data.frame, statistic(lm(formula, data),...),...)
#

In this case is called once per iteration, and a new object of class lm is created each time. Faster (but equivalent) results are attained by using

#
bootstrap(lm(formula, data.frame), statistic(...), ...)
#

which dispatches to bootstrap.lm. The savings come from the reduction of the overhead required to create fitted models. The methods described here do this work just once, save the result in an object of class model.list, and then resample the model.list.

Thus the following are equivalent:
#
# Slow
bootstrap(data=data.frame, stat(lm(formula, data),...),...)
#
# Fast
fit <- lm(formula, data.frame) # returns lm object
bootstrap(fit, stat(lmfit,...),...) # uses bootstrap.lm
#
# Fast
modlst <- lm(data,...,method="model.list") # returns model.list object
bootstrap(modlst, stat(lm(modlst),...),...) # uses bootstrap.default

BUGS:

See .

SEE ALSO:

, , , ,

EXAMPLES:

 
# bootstrap and lm 
bootstrap(fuel.frame, coef(lm(Fuel~Weight+Disp.)), seed=10) 
 
# the same thing but faster, using bootstrap.lm 
fit <- lm(Fuel~Weight+Disp., data=fuel.frame) 
bootstrap(fit, coef) 
 
# Bootstrapping unadjusted residuals in lm (2 equivalent ways) 
fit.lm <- lm(Mileage~Weight, fuel.frame) 
resids <- resid(fit.lm) 
preds  <- predict(fit.lm) 
bootstrap(resids, lm(resids+preds~fuel.frame$Weight)$coef, B=500, seed=0) 
bootstrap(fit.lm, coef, lmsampler="resid", B=500, seed=0) 
 
# Other statistics 
bootstrap(fit, coef(fit)[1]-coef(fit)[2]) 
bootstrap(fit, predict, args.stat=list(newdata=fuel.frame)) 
bootstrap(fit, function(x) predict(x,newdata=fuel.frame)) 
 
# bootstrap and glm 
mform <- Kyphosis ~ Age + (Number > 5)*Start 
fit   <- glm(mform, family = binomial, data = kyphosis, 
             control=glm.control(maxit=20)) 
bootstrap(fit, coef, B=50, seed=8)