samp.bootstrap( n, B, size = n - reduceSize, reduceSize = 0, prob = NULL) samp.boot.bal( n, B, size = n - reduceSize, reduceSize = 0, method = "biased") samp.bootknife( n, B, size = n - reduceSize, reduceSize = 0, njack = 1) samp.finite( n, B, size = n - reduceSize, reduceSize = 0, N, bootknife = F) samp.permute( n, B, size = n - reduceSize, reduceSize = 0, prob = NULL, full.partition = "none") samp.permute.old( n, B) samp.combinations( n, B, k, both = T) samp.half( n, B, size = n/2 - reduceSize, reduceSize = 0) samp.blockBootstrap(n, B, size = n - reduceSize, reduceSize = 0, blockLength) blockBootstrap(blockLength) samp.boot.mc is deprecated; it is the same as samp.bootstrap samp.MonteCarlo is deprecated; it is the same as samp.bootstrap
size
are generated from the sequence
1:n.
(
size is not required)
The remaining arguments are specific to individual samplers
k elements out of
n.
n.
size = n - reduceSize.
Setting
reduceSize = 1
is useful for avoiding bias, see below.
i is chosen from
1:n with probability
prob[i]. The vector is
normalized internally to sum to one. Error if
length(prob) is not
equal to
n. A value of NULL implies equal probabilities for each index.
A sampler that has this argument may be used for
importance sampling.
"biased",
"unbiased", and
"semi"; see below.
njack observations omitted,
then draw a bootstrap sample from that.
"first",
"last", or
"none"; Return, for each
sample, the initial (if
"first") or final
(if
"last")
size elements of a full sample of size
n.
If
"none", do not generate full samples. Valid only if
size < n; ignored otherwise. See below.
TRUE then a variation of bootknife sampling is used;
one observation is omitted from the sample before forming the
superpopulation. This is useful for avoiding bias, see below.
TRUE (the default),
then return a matrix with
n rows, in which the first
k rows
are all combinations of
k elements out of
n.
If
FALSE then return only the first
k rows.
size or
n rows and
B columns
in which each column is one resample,
containing indices from
1:n for subscripting the original data.
.Random.seed
if it does not already exist, otherwise its value is updated.
These samplers are typically called multiple times by
,
to generate indices for a block of say
B=100 replications at a time
(the value of
B here corresponds to the
block.size argument to
bootstrap
).
You may write your own sampler.
A sampler must have arguments
n and
B.
If a sampler has a
prob argument then it may be used for
importance sampling.
Additional arguments may be passed in three ways:
(1) using the
sampler.args argument to
;
(2) by passing an expression such as
samp.bootstrap(size = 100)
(arguments set in
this way override those set by
sampler.args), or
(3) using a "constructor" function such as
blockBootstrap
to create a copy of a sampler function (
samp.blockBootstrap)
which has default values for additional arguments.
If importance sampling is not used, then the
prob argument may
be used as an additional argument, resulting in sampling from a weighted
empirical distribution, but the
observed statistic will not
be consistent with that weighted empirical distribution; instead
consider using importance sampling, then calling
.
Some functions that operate on a
object, including
,
,
assume that simple random sampling with equal probabilities and
size=n
(or approximately
n, see below)
was used, and may give incorrect results if that is not the case.
In other words, they expect
samp.bootstrap or the similar
samp.boot.bal and
samp.bootknife.
Bootstrapping typically gives standard error estimates which are
biased downward; e.g. the ordinary bootstrap standard error
for a mean is
sqrt((n-1)/n) s/sqrt(n)
(plus random error when
B < infinity), where
s = stdev(x)
is the usual sample standard deviation. This is too small by a factor
sqrt((n-1)/n). When stratified sampling is used, the corresponding
downward bias depends on stratum sizes, and may be substantial
There are two easy remedies for this:
use
samp.bootknife, or
samp.bootstrap(reduceSize = 1).
The latter sets the sampling size for each stratum to
1 less than
the stratum size.
For stratified sampling (the
group argument to
bootstrap), the
sampler is called for each sampler. If
size or
reduceSize
is used, then you must set
group.order.matters = FALSE when calling
bootstrap
(otherwise size mismatches will occur, as the code attempts to place
resampled strata in the same positions as the original data).
max(prob) <= 1/size, the
indices in each sample are drawn without replacement. Thus, the
default values
size=n,
prob=NULL
generate simple permutations of
1:n. Otherwise
there are
floor(size*prob[i]) or
ceiling(size*prob[i])
copies of index
i in each sample. The algorithm ensures that the
selection probabilities
prob apply to the rows of the returned
matrix. That is, the relative frequency of index
i per row
approaches
prob[i] as
B increases. See
for details of the algorithm.
Calling
samp.permute with
full.partition = "first", size = m and
then (after re-setting the seed)
full.partition = "last", size = n-m
produces complementary index samples which, when
rbind-ed together,
produce an equivalent set of indices with
size = n. For example, if
probs is not provided, the
rbind-ed results form permutations of
1:n. Note, however, that this will not give the same results for
multiple samples as calling
samp.permute with
size=n, because the
algorithm for
size equal to
n is different than that for
size
not equal to
n.
samp.permute in S-PLUS 6.0 and earlier. It is slower and
less flexible, and may be removed in future versions of Spotfire S+.
B==choose(n,k).
B==factorial(n).
size drawn with replacement from jackknife
samples (obtained by omitting one of the values
1:n).
This produces bootstrap estimates
of squared standard error which are unbiased for a sample mean,
with expected value
s^2/n, where
s^2 is the sample variance
calculated with the usual denominator of
(n-1).
In a block of
B (
block.size) observations,
each observation is omitted
B/n times
(rounded up or down if
n does not divide
B).
N is
a multiple of
n (or of
n-1, if
bootknife=TRUE), then
a superpopulation created by repeating each observation
N/m (where
m=n or
m=n-1) times, and samples without replacement
of size
size are drawn.
If
N is not a multiple of
m, then superpopulations
vary in size between sizes, with
r copies of each original
observation, where
r=ceiling(M/m) or
trunc(M/m)
with probabilities chosen to give approximately the correct
bootstrap variance for linear statistics.
n/2 by default.
size
may be half-integers; if so then alternate samples contain
a zero (i.e. a smaller sample). This is a quick alternative
to the ordinary bootstrap, with approximately the same standard
error.
samp.blockBootstrap; see example at bottom.
The default
"biased" method is balanced -- each observation appears
exactly
B times in the result. In this case
size*B must be a
multiple of
n.
It is biased because rows in its result are not independent.
The bias is of order
O(1/B),
(where
B is the
block.size used in calling
)
and tends to underestimate bootstrap standard errors
and produce confidence intervals which are too narrow.
Variances are too small by a factor of about $(1-1/B)$.
For the
"unbiased" method, each row is generated independently.
If
n divides
B then there are exactly
B/n copies of
1:n in each row, and the result is balanced.
Otherwise there are either
floor(B/n) or
ceiling(B/n) copies
in each row, and the result is not exactly balanced.
For the
"semi" method, if
n divides
B then results
are exactly as for the
"unbiased" method.
If
n divides
size*B results are balanced, but there is bias,
with variances biased downward by a factor of
approximately
(1-(B%%n)/B^2).
Arguments
n and
B should be in that order. The number and order of
other arguments may change; e.g. a
prob argument may be added
to additional samplers to support importance sampling.
Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
Hesterberg, T.C. (1999), "Smoothed bootstrap and jackboot sampling," Technical Report No. 87, http://www.insightful.com/Hesterberg Note - the name "jackboot" has been changed to "bootknife".
Hesterberg, T.C. (2004), "Unbiasing the Bootstrap - Bootknife Sampling vs. Smoothing", Proceedings of the Section on Statistics and the Environment, American Statistical Association, pp. 2924-2930.
For an annotated list of functions in the S+Resample package, including see: .
samp.bootstrap(6, 8)
samp.bootstrap(6, 8, size=12)
samp.boot.bal(6, 8) # method = "biased"
samp.boot.bal(6, 8, method = "unbiased")
samp.boot.bal(6, 8, method = "semi")
samp.permute(6, 8)
samp.permute(6, 8, prob=(1:6))
samp.permute(6, 8, size=12, prob=(1:6))
samp.combinations(6, choose(6,4), 4)
samp.combinations(6, choose(6,4), 4, both=F)
samp.permutations(4, factorial(4))
samp.bootknife(6, 8)
samp.bootknife(6, 8, size=12)
samp.half(6, 8)
samp.half(5, 8)
# Block bootstrapping
bootstrap(1:25, mean)
bootstrap(1:25, mean, sampler = blockBootstrap(5), seed=0)
# Previous line is equivalent to next two:
bootstrap(1:25, mean, sampler = samp.blockBootstrap,
sampler.args = list(blockLength = 5), seed=0)
# The data are positively correlated, so block versions give
# larger standard errors.
# Compare versions of balanced bootstrapping
set.seed(0)
tabulate(samp.boot.bal(6, 8)) # balanced
tabulate(samp.boot.bal(6, 8, method = "unbiased")) # not balanced
tabulate(samp.boot.bal(6, 8, method = "semi")) # balanced
temp <- bootstrap(1:5, mean, block.size=9, seed=0)
temp$estimate$SE
update(temp, sampler = samp.boot.bal)$estimate$SE # smaller
update(temp, sampler = samp.boot.bal,
sampler.args = list(method = "semi"))$estimate$SE