samp.bootstrap( n, B, size = n - reduceSize, reduceSize = 0, prob = NULL) samp.boot.bal( n, B, size = n - reduceSize, reduceSize = 0, method = "biased") samp.bootknife( n, B, size = n - reduceSize, reduceSize = 0, njack = 1) samp.finite( n, B, size = n - reduceSize, reduceSize = 0, N, bootknife = F) samp.permute( n, B, size = n - reduceSize, reduceSize = 0, prob = NULL, full.partition = "none") samp.permute.old( n, B) samp.combinations( n, B, k, both = T) samp.half( n, B, size = n/2 - reduceSize, reduceSize = 0) samp.blockBootstrap(n, B, size = n - reduceSize, reduceSize = 0, blockLength) blockBootstrap(blockLength) samp.boot.mc is deprecated; it is the same as samp.bootstrap samp.MonteCarlo is deprecated; it is the same as samp.bootstrap
size
are generated from the sequence
1:n
.
(
size
is not required)
The remaining arguments are specific to individual samplers
k
elements out of
n
.
n
.
size = n - reduceSize
.
Setting
reduceSize = 1
is useful for avoiding bias, see below.
i
is chosen from
1:n
with probability
prob[i]
. The vector is
normalized internally to sum to one. Error if
length(prob)
is not
equal to
n
. A value of NULL implies equal probabilities for each index.
A sampler that has this argument may be used for
importance sampling.
"biased"
,
"unbiased"
, and
"semi"
; see below.
njack
observations omitted,
then draw a bootstrap sample from that.
"first"
,
"last"
, or
"none"
; Return, for each
sample, the initial (if
"first"
) or final
(if
"last"
)
size
elements of a full sample of size
n
.
If
"none"
, do not generate full samples. Valid only if
size < n
; ignored otherwise. See below.
TRUE
then a variation of bootknife sampling is used;
one observation is omitted from the sample before forming the
superpopulation. This is useful for avoiding bias, see below.
TRUE
(the default),
then return a matrix with
n
rows, in which the first
k
rows
are all combinations of
k
elements out of
n
.
If
FALSE
then return only the first
k
rows.
size
or
n
rows and
B
columns
in which each column is one resample,
containing indices from
1:n
for subscripting the original data.
.Random.seed
if it does not already exist, otherwise its value is updated.
These samplers are typically called multiple times by
,
to generate indices for a block of say
B=100
replications at a time
(the value of
B
here corresponds to the
block.size
argument to
bootstrap
).
You may write your own sampler.
A sampler must have arguments
n
and
B
.
If a sampler has a
prob
argument then it may be used for
importance sampling.
Additional arguments may be passed in three ways:
(1) using the
sampler.args
argument to
;
(2) by passing an expression such as
samp.bootstrap(size = 100)
(arguments set in
this way override those set by
sampler.args
), or
(3) using a "constructor" function such as
blockBootstrap
to create a copy of a sampler function (
samp.blockBootstrap
)
which has default values for additional arguments.
If importance sampling is not used, then the
prob
argument may
be used as an additional argument, resulting in sampling from a weighted
empirical distribution, but the
observed
statistic will not
be consistent with that weighted empirical distribution; instead
consider using importance sampling, then calling
.
Some functions that operate on a
object, including
,
,
assume that simple random sampling with equal probabilities and
size=n
(or approximately
n
, see below)
was used, and may give incorrect results if that is not the case.
In other words, they expect
samp.bootstrap
or the similar
samp.boot.bal
and
samp.bootknife
.
Bootstrapping typically gives standard error estimates which are
biased downward; e.g. the ordinary bootstrap standard error
for a mean is
sqrt((n-1)/n) s/sqrt(n)
(plus random error when
B < infinity
), where
s = stdev(x)
is the usual sample standard deviation. This is too small by a factor
sqrt((n-1)/n)
. When stratified sampling is used, the corresponding
downward bias depends on stratum sizes, and may be substantial
There are two easy remedies for this:
use
samp.bootknife
, or
samp.bootstrap(reduceSize = 1)
.
The latter sets the sampling size for each stratum to
1
less than
the stratum size.
For stratified sampling (the
group
argument to
bootstrap
), the
sampler is called for each sampler. If
size
or
reduceSize
is used, then you must set
group.order.matters = FALSE
when calling
bootstrap
(otherwise size mismatches will occur, as the code attempts to place
resampled strata in the same positions as the original data).
max(prob) <= 1/size
, the
indices in each sample are drawn without replacement. Thus, the
default values
size=n
,
prob=NULL
generate simple permutations of
1:n
. Otherwise
there are
floor(size*prob[i])
or
ceiling(size*prob[i])
copies of index
i
in each sample. The algorithm ensures that the
selection probabilities
prob
apply to the rows of the returned
matrix. That is, the relative frequency of index
i
per row
approaches
prob[i]
as
B
increases. See
for details of the algorithm.
Calling
samp.permute
with
full.partition = "first", size = m
and
then (after re-setting the seed)
full.partition = "last", size = n-m
produces complementary index samples which, when
rbind
-ed together,
produce an equivalent set of indices with
size = n
. For example, if
probs
is not provided, the
rbind
-ed results form permutations of
1:n
. Note, however, that this will not give the same results for
multiple samples as calling
samp.permute
with
size=n
, because the
algorithm for
size
equal to
n
is different than that for
size
not equal to
n
.
samp.permute
in S-PLUS 6.0 and earlier. It is slower and
less flexible, and may be removed in future versions of Spotfire S+.
B==choose(n,k)
.
B==factorial(n)
.
size
drawn with replacement from jackknife
samples (obtained by omitting one of the values
1:n
).
This produces bootstrap estimates
of squared standard error which are unbiased for a sample mean,
with expected value
s^2/n
, where
s^2
is the sample variance
calculated with the usual denominator of
(n-1)
.
In a block of
B
(
block.size
) observations,
each observation is omitted
B/n
times
(rounded up or down if
n
does not divide
B
).
N
is
a multiple of
n
(or of
n-1
, if
bootknife=TRUE
), then
a superpopulation created by repeating each observation
N/m
(where
m=n
or
m=n-1
) times, and samples without replacement
of size
size
are drawn.
If
N
is not a multiple of
m
, then superpopulations
vary in size between sizes, with
r
copies of each original
observation, where
r=ceiling(M/m)
or
trunc(M/m)
with probabilities chosen to give approximately the correct
bootstrap variance for linear statistics.
n/2
by default.
size
may be half-integers; if so then alternate samples contain
a zero (i.e. a smaller sample). This is a quick alternative
to the ordinary bootstrap, with approximately the same standard
error.
samp.blockBootstrap
; see example at bottom.
The default
"biased"
method is balanced -- each observation appears
exactly
B
times in the result. In this case
size*B
must be a
multiple of
n
.
It is biased because rows in its result are not independent.
The bias is of order
O(1/B)
,
(where
B
is the
block.size
used in calling
)
and tends to underestimate bootstrap standard errors
and produce confidence intervals which are too narrow.
Variances are too small by a factor of about $(1-1/B)$.
For the
"unbiased"
method, each row is generated independently.
If
n
divides
B
then there are exactly
B/n
copies of
1:n
in each row, and the result is balanced.
Otherwise there are either
floor(B/n)
or
ceiling(B/n)
copies
in each row, and the result is not exactly balanced.
For the
"semi"
method, if
n
divides
B
then results
are exactly as for the
"unbiased"
method.
If
n
divides
size*B
results are balanced, but there is bias,
with variances biased downward by a factor of
approximately
(1-(B%%n)/B^2)
.
Arguments
n
and
B
should be in that order. The number and order of
other arguments may change; e.g. a
prob
argument may be added
to additional samplers to support importance sampling.
Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
Hesterberg, T.C. (1999), "Smoothed bootstrap and jackboot sampling," Technical Report No. 87, http://www.insightful.com/Hesterberg Note - the name "jackboot" has been changed to "bootknife".
Hesterberg, T.C. (2004), "Unbiasing the Bootstrap - Bootknife Sampling vs. Smoothing", Proceedings of the Section on Statistics and the Environment, American Statistical Association, pp. 2924-2930.
For an annotated list of functions in the S+Resample package, including see: .
samp.bootstrap(6, 8) samp.bootstrap(6, 8, size=12) samp.boot.bal(6, 8) # method = "biased" samp.boot.bal(6, 8, method = "unbiased") samp.boot.bal(6, 8, method = "semi") samp.permute(6, 8) samp.permute(6, 8, prob=(1:6)) samp.permute(6, 8, size=12, prob=(1:6)) samp.combinations(6, choose(6,4), 4) samp.combinations(6, choose(6,4), 4, both=F) samp.permutations(4, factorial(4)) samp.bootknife(6, 8) samp.bootknife(6, 8, size=12) samp.half(6, 8) samp.half(5, 8) # Block bootstrapping bootstrap(1:25, mean) bootstrap(1:25, mean, sampler = blockBootstrap(5), seed=0) # Previous line is equivalent to next two: bootstrap(1:25, mean, sampler = samp.blockBootstrap, sampler.args = list(blockLength = 5), seed=0) # The data are positively correlated, so block versions give # larger standard errors. # Compare versions of balanced bootstrapping set.seed(0) tabulate(samp.boot.bal(6, 8)) # balanced tabulate(samp.boot.bal(6, 8, method = "unbiased")) # not balanced tabulate(samp.boot.bal(6, 8, method = "semi")) # balanced temp <- bootstrap(1:5, mean, block.size=9, seed=0) temp$estimate$SE update(temp, sampler = samp.boot.bal)$estimate$SE # smaller update(temp, sampler = samp.boot.bal, sampler.args = list(method = "semi"))$estimate$SE