General Nonparametric Jackknife

DESCRIPTION:

Performs delete-one jackknifing of observations from specified data. Calculates jackknife statistics for parameters of interest and produces an object of class jackknife. The jackknife function is generic (see Methods); method functions can be written to handle specific classes of data. Classes which already have methods for this function include:
.

USAGE:

jackknife(data, statistic, args.stat, 
          group, subject, 
          label, statisticNames, 
          seed = .Random.seed, 
          group.size = 1, assign.frame1 = F, 
          save.group, save.subject, ...) 

See for further details of arguments marked with "*" (including important capabilities not described here).

REQUIRED ARGUMENTS:

data*
data to be jackknifed. May be a vector, matrix, data frame, or output from a modeling function like .

OPTIONAL ARGUMENTS:

statistic*
statistic to be jackknifed: a function or expression that returns a vector or matrix. It may be a function which accepts data as the first argument; other arguments may be passed using args.stat.
Or it may be an expression such as mean(x,trim=.2). If data is given by name (e.g. data=x) then use that name in the expression, otherwise (e.g. data=air[,4]) use the name data in the expression. If data is a data frame, the expression may involve variables in the data frame.
args.stat*
list of other arguments, if any, passed to statistic when calculating the statistic on the resamples.
group*
vector of length equal to the number of observations in data indicating that the data is stratified, for multiple-sample problems. Unique values of this vector determine the groups. This does not affect resampling, as it does with boostrap, but it does affect the calculation of jackknife statistics. If data is a data frame, this may be a variable in the data frame, or expression involving such variables.
subject*
vector of length equal to the number of observations in data; if present then subjects (determined by unique values of this vector) are resampled rather than individual observations. If data is a data frame, this may be a variable in the data frame, or an expression involving such variables. If group is also present, subject must be nested within group (each subject must be in only one group).
label
character, if supplied is used when printing, and as the main title for plotting.
statisticNames
character vector of length equal to the number of statistics calculated; if supplied is used as the statistic names for printing and plotting.
seed
seed for randomization done by statistic, and for random assignment of observations to groups if group.size is not equal to one. May be a legal random number seed or an integer between 0 and 1023 which is passed to set.seed.
group.size
integer giving the number of observations to remove in each resample. If group.size=1, the standard delete-1 jackknife is performed. Otherwise, the observations are divided into B = floor(n/group.size) groups of equal size and these groups are jackknifed. Although this is similar to delete-d jackknifing, all possible subsets of the specified size are not used, and the jackknife statistics treat the replicates as a standard jackknife sample of size B. This is provided primarily to allow grouped jackknifing when calculating acceleration for BCa confidence intervals. The value is restricted to 1 if the group argument is present.
assign.frame1
logical flag indicating whether the resampled data should be assigned to frame 1 before evaluating the statistic. Try assign.frame1=T if all estimates are identical (this is slower).
save.group, save.subject
logical flags, if TRUE then group and subject vectors, respectively, are saved in the returned object. Both defaults are TRUE if n<=10000.

VALUE:

an object of class jackknife which inherits from resamp. This has components call, observed, replicates, estimate, B, n (the number of observations, or subjects), dim.obs, seed.start, defaultLabel, n.groups, and parent.frame (the frame of the caller of jackknife), possibly label, group and subject. The data frame estimate has three columns containing the bootstrap estimates of Bias, Mean, and SE. See for a description of many components.

SIDE EFFECTS:

If assign.frame1=T, you must be sure that this assignment does not overwrite some quantity of interest stored in frame 1.

DETAILS:

Performs nonparametric jackknifing of observations for a wide scope of statistics and expressions.

If group is present, then group.size must be 1, and one observation is removed at a time. In some settings this gives samples that are not representative of the original sampling plan, e.g. in stratified sampling where the original sampling plan drew exactly equal number of observations from each stratum. If the statistic is sensitive to the number of observations present in each group then results may be incorrect; e.g. if the statistic is the difference between means of two samples (groups) then results are probably fine, but not if it is an overall average. Hence use caution in interpreting bias and standard error estimates produced by this function.

REFERENCES:

Davison, A.C. and Hinkley, D.V. (1997), Bootstrap Methods and Their Applications, Cambridge University Press.

Efron, B. and Tibshirani, R.J. (1993), An Introduction to the Bootstrap, San Francisco: Chapman & Hall.

BUGS:

See

SEE ALSO:

and do similar calculations.

More details on many arguments, see .

Jackknife and other objects: , , . Other model objects are handled without special methods.

Print, summarize, plot: , , , .

Description of a "jackknife" object, extract parts: , , .

Confidence intervals: , .

Modify a "jackknife" object: .

For an annotated list of functions in the package, including other high-level resampling functions, see: .

EXAMPLES:

jackknife(stack.loss, var) 
# See help(bootstrap) and help(bootstrap.args) for more examples of syntax. 
 
# The jackknife can be used to approximate the empirical influence 
# function 
x <- longley.x[,2:3] 
jfit <- jackknife(x, cor(x[,1], x[,2])) 
influence1 <- subtractMeans(-(jfit$n - 1) * (jfit$replicates - jfit$observed)) 
 
# by hand 
influence2 <- resampGetL(jfit) 
 
# resampGetL does the same calculations, then standardizes 
influence3 <- resampGetL(jfit, method="influence") 
 
# using a finite-delta calculation 
cor(cbind(influence1, influence2, influence3)) # nearly equivalent 
print(influence1) 
plot(x[,1], x[,2]) 
text(x[,1], x[,2] + 10, as.character(round(influence1, 2))) 
# Note that points to the upper right and lower left have positive 
# influence values -- they contributed to positive correlation.  Points 
# to the upper left and lower right have negative influence.