jackknifeAfterBootstrap(boot.obj, functional=NULL, graphical=NULL, passObserved = FALSE, jack.obj, threshold = 2, subset.statistic = 1:p, control = "choose", moments = 2, crossCorr = FALSE, ..., frame.eval)
bootstrap
.
"Quantiles"
,
"Centered Quantiles"
,
"Standardized Quantiles"
,
"Mean"
,
"Bias"
,
"SE"
, or
"Bias&SE"
.
Or it may be a function that takes the matrix of bootstrap
replicates as its first argument, such as
colMeans
; see DETAILS
below for additional requirements.
TRUE
then this function focuses on graphical
diagnostics, and
functional
defaults to
"Quantiles"
.
If
FALSE
then
functional
defaults to "Bias&SE".
Default is
TRUE
, unless
functional
is specified as one of
"Mean"
,
"Bias"
,
"SE"
, or
"Bias&SE"
.
TRUE
if
functional
accepts the observed value of the
bootstrap statistic as an argument;
e.g.
"Bias"
is the mean of the bootstrap distribution minus the
observed value.
This is set automatically if
functional
is one of the
character strings above, to
TRUE
for
"Bias"
,
"Bias&SE"
,
"Centered Quantiles"
and
"Standardized Quantiles"
.
boot.obj
.
This will be created if not supplied, unless you set this to
FALSE
.
"none"
,
"controlVariates"
,
"concomitants"
,
or
"choose"
(choose one of the others). These techniques may be
used to reduce Monte Carlo sampling variability; see below.
"controlVariates"
.
TRUE
if
passObserved = TRUE
or
functional="Bias"
or
"Bias&SE"
.
functional
.
For example, for quantiles you may pass a
probs
argument;
the default is
c(0.025, 0.16, 0.5, 0.84, 0.975)
.
boot.obj
can be found.
You need to specify this if objects can't be found by their
original names, or have changed; see
.
jackknifeAfterBootstrap
with components
call
,
Func
,
estimate
,
replicates
,
jabB
,
graphical
,
control
,
rel.influence
,
large.rel.influence
,
threshold
,
n
,
B
,
L
,
dim.Func
,
dimnames.Func
,
quantiles
,
jack.obj
,
and
cross.corr
.
Some of the components may be missing, or
NULL
.
Others are always computed, but not normally
used (in printing or plotting) if
graphical=TRUE
.
k
denote the length of this.
Mean.Func
,
Bias.Func
, and
SE.Func
(the mean of jackknife replicates for the functional, and jackknife
estimates of bias and standard error of the functional).
This has
k
rows and three columns.
analytical jackknife estimates of the bias and
standard error of the functional. If
graphical=FALSE
then printing the result shows these estimates.
n
rows (the original sample size) and
k
columns.
When
graphical=TRUE
by default this contains (centered) quantiles of
the leave-one-out bootstrap distributions.
Plotting the result shows these quantiles plotted
against influence function values
L
.
jackknife
, containing the leave-one-out statistics;
these correspond to the "observed" values for the leave-one-out bootstraps.
n
leave-one-out bootstrap distributions. Typically about
B/e
.
n
rows and
k
columns,
standardized estimates of the influence of each observation on the
functional; these are the
replicates
minus their column means
and divided by their standard deviations, then rescaled.
k
, with one component for each dimension of
the functional (e.g. a component for each quantile),
containing the large values
of relative influence for that dimension.
NULL
) if not used.
Func
, before it is converted
to a vector). For example, for a multivariate
statistic when the functional returns quantiles, the functional
returns one row for each quantile and one column for each dimension
of the statistic.
TRUE
if the functional was one of the quantile choices.
k x p
, where
p
is the length of
boot.obj$observed
(using
subset.statistic
does not affect this).
Consider first the graphical approach to jackknife-after bootstrap.
We begin by bootstrapping some statistic, such as a sample mean
or regression coefficient. Then, imagine leaving out one
observation at a time from the original sample, and repeating
the bootstrap process. This gives
n
bootstrap distributions.
The differences between those distributions, and between them and the original
bootstrap distribution, indicate the influence that each observation
has on the bootstrap distributions.
In particular,
we compute the quantiles, or some other functional, for each
leave-one-out distribution, and
plot the quantiles against
L
, the empirical influence
of each observation on the original statistic (or against
a column of the original data, the observation number, or the
jackknife statistic values).
For example, when considering use of a inference procedure such as t-tests or confidence intervals that assumes that standard errors are independent of the statistic, it is useful to check that assumption by plotting either "Centered Quantiles" or "SE" (standard error) against the jackknife statistics.
The implentation is clever--instead of drawing new bootstrap distributions after leaving out each observation, it uses the subset of the original bootstrap samples that do not contain that observation.
Now consider the analytical jackknife-after-bootstrap. This is a technique for approximating the standard error and bias of bootstrap functionals (such as a bootstrap standard error estimate, which is the standard deviation of a bootstrap distribution). Conceptually we leave out an observation at a time, generate a bootstrap distribution, calculate the bootstrap estimate of bias, standard error, or some other quantity based on this bootstrap distribution, repeat for each observation, and combine all the quantities using the jackknife formulae for standard error and bias.
The basic principle of the bootstrap is to estimate something about the sampling distribution (which depends on the unknown underlying distribution) using the bootstrap distribution (which depends on the empirical distribution). This is accurate only for functional summaries which are approximately pivotal (do not depend on the underlying distribution) such as bias or standard error; e.g. a bootstrap estimate of bias is usually a reasonable approximation to the true bias. In contrast, consider a non-pivotal quantity such as the mean; the mean of the bootstrap distribution should never be used to estimate the mean of the true sampling distribution.
The correlation between jackknife replicates of the original statistic and jackknife replicates of the functional gives an indication of whether or not the functional is pivotal; a high correlation indicates a strong dependence of the functional on the underlying distribution, in particular that aspect of the underlying distribution which is measured by the original statistic.
Jackknife-after-bootstrap estimates for SE and Bias should be interpreted with caution for non-pivotal statistics.
Conversely, sometimes one is interested in the influence of individual observations on functionals which need not be pivotal, such as the mean of the bootstrap distribution.
Some functionals require both a bootstrap distribution and observed
value, e.g.
functional = "Bias"
. In this case, when computing
the functional for the leave-one-out bootstrap distributions, the
observed value passed is computed from the jackknife sample
(these are stored as the
"replicates"
component of a jackknife object.
There is one nearly-fatal flaw with the analytical jackknife-after-bootstrap.
Jackknife estimates for bias and standard error multiply
small quantities by
n
. When those small quantities are subject
to Monte Carlo variability, multiplying by
n
can make the estimates
explode.
To reduce that Monte Carlo variability, one can make the bootstrap
sample size
B
huge. This function also provides two variance-reduction
techiques,
concomitants
(particularly useful for quantiles)
and
controlVariates
(particularly useful for moments, including
mean, bias, and standard error). See examples below.
We conclude this section with additional notes on one of the input arguments.
functional
may be a function that takes a matrix
of bootstrap replicates as its first argument, e.g.
colMeans
.
If
passObserved=TRUE
the statistic should accept the original
observed statistic, or jackknife leave-one-out statistic,
as its second argument.
It must accept an argument
weights
(for weighted bootstrap distributions);
the value
NULL
signifies no weights.
Finally, there may be additional arguments passed using
...
.
You may use the functionals stored in
resampFunctionalList
as models.
The concomitants adjustment is based on saddlepoint approximations
for continuous distributions. For very small
n
the bootstrap
distribution is not approximately continuous - the approximation
may be poor, and numerical problems may occur.
Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Application, Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.
# Our goal here is to see how leaving out individual observations # affects the bootstrap distribution. We can see this most easily # by looking at how the quantiles of the bootstrap distribution change. # # However, there is substantial random variation - we can reduce this # using variance reduction techniques, either control variates # or concomitants. The latter is best for quantiles, the former for # moments, like the mean, bias, or standard error (standard deviation # of the bootstrap distribution). # # Those variance reduction techniques are most effective when statistics # have a good linear approximation. # # We can also use analytical calculations, based on jackknife formulas, # to estimate the bias and standard error of any functional of the # bootstrap distribution (e.g. the bias and standard error of a quantile, # or of a bootstrap bias estimate). However, here the random variation # is catastrophic, unless variance reduction is very effective or # the bootstrap sample sizes are immense. x <- qgamma(ppoints(19), shape = 0.5) # artificial skewed data boot <- bootstrap(x, mean) plot(boot) # slightly skewed jo <- jackknife(x, mean) # Supplying this can speed calculations jab1 <- jackknifeAfterBootstrap(boot, jack.obj = jo, control="none") plot(jab1) # The larger data points have positive influence, and are at the right. # Note that leaving them out shifts the quantiles of the bootstrap # distributions downward. # The distribution is narrower, and has a smaller median, # when large observations are omitted. # Also note that the answers are noisy. # Try two variance reduction techniques to reduce the noise. # First concomitants: jab2 <- jackknifeAfterBootstrap(boot, jack.obj = jo, control="concomitants") plot(jab2) # Very smooth. It is easier to see the effect of leaving out observations. # Leaving out the largest observation (far left) affects the # upper percentiles in particular jab3 <- jackknifeAfterBootstrap(boot, jack.obj = jo, control="controlVariates") plot(jab3) # Using control variates had little effect on the noise # Some additional options plot(jab2, xaxis = "data") # Original data - works best for univariate data plot(jab2, xaxis = "L") # empirical influence function plot(jab2, xaxis = "Observation") plot(jab2, type = "l") plot(jab1, graphical = F) # standardized influence (much noise) plot(jab2, graphical = F) # standardized influence (little noise) plot(jab1, graphical = F, subset.plots = c(1,5)) plot(jab1, graphical = F, absolute = T) ### Try using "Centered Quantiles" jab4 <- update(jab1, functional = "Centered Quantiles") plot(jab4) jab5 <- update(jab2, functional = "Centered Quantiles") plot(jab5) jab6 <- update(jab3, functional = "Centered Quantiles") plot(jab6) # The jab5 case shows a subtle effect; leaving out the smallest # observations (on the left) gives bootstrap distributions which are # slightly narrower. This is more pronounced with more symmetric data. # Repeat, but this time letting the functional that is analyzed by # jackknifeAfterBootstrap be the mean of the bootstrap distribution # (rather than quantiles). # # This sets graphical=FALSE, but we can override this # when printing or plotting jab1 <- jackknifeAfterBootstrap(boot, "mean", jack.obj = jo, control="none") jab1 # That gives a non-zero estimate for bias due to random variation # due to noise -- the true bias for the mean is zero plot(jab1) # standardized influence plot(jab1, graphical=T) # y = means of leave-one-out bootstrap distns # The title "Mean.mean" indicates that the functional is the mean of # the bootstrap distribution, and the bootstrap distribution is for # the mean of the data. # Try variance reduction, for more accurate estimates jab2 <- update(jab1, control = "concomitants") jab2 # Bias estimate is closer to zero plot(jab2) plot(jab2, graphical=T) # Much less random variability than for jab1 jab3 <- update(jab1, control = "controlVariates") jab3 # Bias is zero - eliminating noise results in the right answer plot(jab3) plot(jab3, graphical=T) # For "mean" we can also use just one moment (for "SE" it is best to use two) jab4 <- update(jab3, moments = 1) jab4 plot(jab4) plot(jab4, graphical=T) ### Repeat, but this time letting the functional be standard error # (standard deviation of the bootstrap distribution). jab1 <- jackknifeAfterBootstrap(boot, "se", jack.obj = jo, control="none") jab1 # That SE.Func exploded due to noise plot(jab1) # standarized influence plot(jab1, absolute=T) plot(jab1, graphical=T) # means of leave-one-out bootstrap distns jab2 <- update(jab1, control = "concomitants") jab2 # Note that SE.Func is much smaller -- less noise plot(jab2) # The largest observation has a big effect on the standard error plot(jab2, graphical=T) # Omitting that observation makes the estimated SE much smaller. jab3 <- update(jab1, control = "controlVariates") jab3 plot(jab3) plot(jab3, graphical=T) # For "se" it is best to use two moments to reduce variance. jab4 <- update(jab3, moments = 1) jab4 plot(jab4) # worse than jab3 plot(jab4, graphical=T) # worse than jab3 # The gains from the variance reduction techniques are especially # good in the preceding example, where the statistic is the mean. # For nonlinear statistics the gains are smaller, especially for small # samples. boot.obj <- bootstrap(stack.loss, var) jack.obj <- jackknife(stack.loss, var) jab1 <- jackknifeAfterBootstrap(boot.obj, jack.obj = jack.obj, control="none") plot(jab1) jab2 <- update(jab1, control = ) # equivalent to control = "choose" plot(jab2) # Variance reduction helped for the middle quantile, less for others plot(jab2, graphical=FALSE, xaxis = "data") # a lot of noise, not too useful # You can supply your own functional to summarize a bootstrap distribution; # this may depend on the observed value in addition to the replicates. # The following is equivalent to functional = "Bias". bias.fun <- function(x, observed, weights, ...) colMeans(x, weights = weights, na.rm = T) - observed jackknifeAfterBootstrap(boot.obj, functional = bias.fun, passObserved = T, jack.obj = jack.obj, graphical = F, control = "control") jackknifeAfterBootstrap(boot.obj, "Bias", jack.obj=jack.obj)