Smoothed Bootstrapping

DESCRIPTION:

Performs smoothed bootstrap resampling of observations from specified data, for specified statistics, and summarizes the bootstrap distribution.

USAGE:

smoothedBootstrap(data, statistic, B = 1000,  
           args.stat, sampler = samp.bootstrap,  
           seed = .Random.seed, smoother = rmvnorm, 
           args.smoother = list(cov = as.matrix(var(data))/numRows(data),  
                                d = numCols(data)), 
           label, statisticNames, 
           block.size = min(100, B), trace = resampleOptions()$trace, 
           assign.frame1 = F, save.samples = F, 
           statistic.is.random, seed.statistic = 500) 

REQUIRED ARGUMENTS:

data
data to be bootstrapped. May be a vector, matrix, or data frame.
statistic
statistic to be bootstrapped; a function or expression that returns a vector or matrix. It may be a function which accepts data as the first argument; other arguments may be passed using args.stat.
Or it may be an expression such as mean(x,trim=.2). If data is given by name (e.g. data=x) then use that name in the expression, otherwise (e.g. data=air[,4]) use the name data in the expression. If data is a data frame, the expression may involve variables in the data frame.

OPTIONAL ARGUMENTS:

B
number of bootstrap resamples to be drawn. This may be a vector, whose sum is the total number of resamples.
args.stat
list of other arguments, if any, passed to statistic when calculating the statistic on the resamples.
sampler
function which generates resampling indices. The function generates simple bootstrap resamples. See for other existing samplers, and for details on writing your own sampler.
seed
seed for generating resamples. May be a legal random number seed or an integer between 0 and 1023 which is passed to set.seed.
smoother
function which generates random variates from a (continuous, symmetric, mean zero) multivariate distribution; these variates are added to the resampled original data values (chosen using sampler), in effect sampling from a smoothed empirical distribution.
args.smoother
list of other arguments, if any, passed to smoother when performing the smoothing of the resamples.
label
character, if supplied is used when printing, and as the main title for plotting.
statisticNames
character vector of length equal to the number of statistics calculated; if supplied is used as the statistic names for printing and plotting.
block.size
control variable specifying the number of resamples to calculate at once. smoothedBootstrap calls , which uses nested for() loops; generally, the inner loop runs block.size times.
trace
logical flag indicating whether the algorithm should print a message indicating which set of replicates is currently being drawn. The default is determined by .
assign.frame1
logical flag indicating whether the resampled data should be assigned to frame 1 before evaluating the statistic. Try assign.frame1=T if all estimates are identical (this is slower).
save.samples
logical flag indicating whether to save the resampled data. Note that this may require a large quantity of memory.
statistic.is.random
logical flag indicating whether the statistic itself performs randomization, in which case we need to keep track of two parallel seeds, one for the sampling and one for the statistic. If this argument is missing, the algorithm attempts to determine if the statistic involves randomization by evaluating it and checking whether the random seed has changed.
seed.statistic
random number seed to be used for the statistic if it uses randomization.

VALUE:

an object of class smoothedBootstrap which inherits from resamp. This has components call, observed, replicates, estimate, B, n, dim.obs, seed.start, seed.end, parent.frame, defaultLabel, and possibly label. The data frame estimate has three columns containing the bootstrap estimates of Bias, Mean, and SE. See for a description of many components.

SIDE EFFECTS:

If assign.frame1=T, you must be sure that this assignment does not overwrite some quantity of interest stored in frame 1.

If the function is interrupted it saves current results (all complete sets of block.size replicates) to .smoothedBootstrap.partial.results (and also over-writes .parametricBootstrap.partial.results). This object is nearly the same as if smoothedBootstrap were called with a smaller value of B.

DETAILS:

Performs smoothed bootstrapping for a wide range of statistics and expressions. Observations are first selected, with replacement, from the original data. This is done using the sampler function, in the same way as is done in bootstrap. Observations are then smoothed by adding a random variable generated by the smoother. This corresponds to convolving the empirical distribution function with a kernel corresponding to smoother. This is implemented by creating a composite sampling function which is passed to .

This version uses a lot of memory, and time. It may at some point be replaced by a version which calls bootstrap, with a front end that modifies the statistic to add the right amount of noise to the data before calculating the real statistic. However, many of the functions that currently accept "bootstrap" objects as input should not be used in that case, because they require that the statistic be (nearly) deterministic.

Data sets (arrays) of dimension higher than 2 may not be passed to smoothedBootstrap.

REFERENCES:

Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.

Hesterberg, T.C. (2004), "Unbiasing the Bootstrap - Bootknife Sampling vs. Smoothing", Proceedings of the Section on Statistics and the Environment, American Statistical Association, pp. 2924-2930.

SEE ALSO:

, , .

More details on some arguments, see .

Print, summarize, plot: , , , , ,

Description of an "smoothedBootstrap" object, extract parts: , .

Confidence intervals: , .

Modify an "smoothedBootstrap" object: , .

For an annotated list of functions in the package, including other high-level resampling functions, see: .

EXAMPLES:

# compare the cdf of the smoothedBootstrap replicates to that of 
# the bootstrap replicates -- smoothedBootstrap's is smooth 
# by comparison 
boot.stack <- bootstrap(stack.loss, median) 
sboot.stack <- smoothedBootstrap(stack.loss, median) 
cdf.compare(boot.stack$replicates, sboot.stack$replicates)