Bootstrap for comparing two samples

DESCRIPTION:

Bootstrap the difference or ratio of a statistic computed on two samples.

USAGE:

bootstrap2(data, statistic, treatment, data2, ratio = F, B = 1000, 
           group, subject, seed = .Random.seed,  
           trace = resampleOptions()$trace, 
           save.group, save.subject, save.treatment, L = NULL, 
           twoSample.args = NULL, ...){ 

REQUIRED ARGUMENTS:

data
numerical vector or matrix, or data frame. Each column is treated as a separate variable. The name "data" must not be used for any column name.
statistic
statistic to be computed; a function or expression that returns a vector or matrix. It may be a function which accepts data as the first argument.

Alternatively it may be an expression such as mean(x,trim=.2). If data is given by name (e.g. data=x) then use that name in the expression, otherwise (e.g. data=air[,4]) use the name "data" in the expression (e.g. mean(data,trim=.2). An exception to this rule is when argument data2 is used. In that case, you must use the name "data" in the expression, regardless of whether argument data or data2 is given by name. In any case, the name "data" is reserved for use to refer to the data to be bootstrapped, and should not be used in statistic to refer to any other object.

If data is a data frame, the expression may involve variables in the data frame. For examples see .

OPTIONAL ARGUMENTS:

treatment
vector of length equal to the number of observations in data. This must have two unique values, which determine the two samples to be compared. If data is a data frame, this may be a variable in the data frame, or an expression involving such variables. One of treatment or data2 (but not both) must be used.
data2
numerical vector or matrix, or data frame, like data. Observations in data are taken to be one sample, and those in data2 are taken to be the other. If data2 is a matrix or data frame, it must have the same number of columns (and column names, if any), as data. One of treatment or data2 (but not both) must be used.
ratio
logical value, if FALSE (the default) then bootstrap the difference in statistics between the two samples; if TRUE then bootstrap the ratio.
B
integer, number of random bootstrap samples to use.
group
vector of length equal to the total number of observations (in data if treatment supplied, or in data and data2), for stratified sampling. Within each of the two samples defined by treatment or data2, sampling is done separately for each stratum (determined by unique values of the group vector). If data is a data frame and treatment is used, this may be a variable in the data frame, or an expression involving such variables.
subject
vector of length equal to the total number of observations If present then subjects (determined by unique values of this vector) are resampled rather than individual observations. If data is a data frame and treatment is used, this may be a variable in the data frame, or an expression involving such variables. This must be nested within treatment, and within group, if group is used (all observations for a subject must be in the same treatment or group sample).
Under certain conditions bootstrap makes resampled subjects unique before calling the statistic.
seed
seed for generating resampling indices; a legal seed, e.g. an integer between 0 and 1023. See .
trace
logical flag indicating whether to print messages indicating progress. The default is determined by .
save.group
save.subject
save.treatment
logical flags indicating whether to return the group, subject and treatment vectors. Default is TRUE if number of observations is <= 10000 or if data2 supplied, FALSE otherwise. If not saved these can generally be recreated by when needed if treatment was supplied, but not if data2 was supplied.
L
empirical influence values. This may be a string indicating which method to use for calculation one of "jackknife", "influence", "regression", "ace", or "choose". See for further information and references. Or it may be a matrix with n (number of observations or subjects in data, or in data and data2) rows and p (length of the returned statistic) columns; in this case the L values for the second treatment group (or data2) should be -1 times the value they would have for e.g. bootstrap(data2, statistic).
...
additional arguments to . These arguments are used for both samples.
twoSample.args
a list of length two, each component itself a list containing additional args to .

VALUE:

An object of class bootstrap2 which inherits from bootstrap and resamp. This has components call, observed, replicates, estimate, B, n, dim.obs, treatment, parent.frame, seed.start, seed.end, and bootstrap.objects. It may have components ratio, group, subject, L, Lstar, indices, compressedIndices, and others. See for a description of most components. Components particularly relevant are:
observed
vector of length p (the number of variables in data), containing the difference in the statistic computed on each of the two samples, for the original data.
replicates
matrix of dimension B by p, containing the difference in the statistic computed on each the two samples, for each resample.
estimate
data.frame with p rows and columns "Mean", "Bias" and "SE".
bootstrap.objects
list containing the two bootstrap objects used to generate the above results. See below.
ratio
this is present only when bootstrapping the ratio between samples; in that case this is the logical value TRUE.

SIDE EFFECTS:

The function bootstrap2 causes creation of the dataset .Random.seed if it does not already exist, otherwise its value is updated.

DETAILS:

This function resamples within each of the two treatment samples separately. The results are logically equivalent to

bootstrap(data, statistic(data[treatment1,]) - statistic(data[treatment2,]), 
          group = treatment, ...) 

although different random sampling is used, and only bootstrap2 supports stratified sampling.

Internally, bootstrap2 calls twice, once for each treatment value.

For comparison, and permute the entire data set and then divide it into two samples before computing the statistic on each sample.

SEE ALSO:

, .

For more details on many arguments see: .

Confidence intervals: , , , , .

For a hypothesis test comparing two samples, see: , .

For an annotated list of functions in the package, including other high-level resampling functions, see: .

EXAMPLES:

set.seed(0) 
x <- matrix(rnorm(15*3), 15) 
treatment <- rep(c(T,F), length=15) 
bootstrap2(x, statistic = colMeans, treatment = treatment, seed = 1) 
 
# data2 and group arguments 
set.seed(10) 
data1 <- data.frame(x = runif(30), g = rep(1:2, c(10, 20))) 
data2 <- data.frame(x = runif(20), g = rep(1:2, 10)) 
boot <- bootstrap2(data = data1, statistic = mean(x), data2 = data2, 
          group = g, L="regression") 
boot 
 
# twoSample.args 
boot <- update(boot, 
          twoSample.args = list( list(seed=5), list(seed=6))) 
boot 
boot$bootstrap.objects[[1]]