Permutation test for comparing two samples

DESCRIPTION:

Permutation test for the difference or ratio of a statistic computed on two samples

USAGE:

permutationTest2(data, statistic, treatment, data2, 
                 B = 999, alternative = "two.sided", ratio = F, 
                 paired = F, group = NULL, 
                 combine = NULL, 
                 combinationFunction = combinePValues.Fisher,  
                 seed = .Random.seed, trace = resampleOptions()$trace,  
                 save.group, save.treatment, ...) 

REQUIRED ARGUMENTS:

data
numerical vector or matrix, or data frame. Each column is treated as a separate variable.
statistic
statistic to be computed; a function or expression that returns a vector or matrix. It may be a function which accepts data as the first argument. Or it may be an expression such as mean(x,trim=.2). If data is given by name (e.g. data=x) then use that name in the expression, otherwise (e.g. data=air[,4]) use the name data in the expression. If data is a data frame, the expression may involve variables in the data frame. For examples see .

OPTIONAL ARGUMENTS:

treatment
vector of length equal to the number of observations in data. This must have two unique values, which determine the two samples to be compared. If data is a data frame, this may be a variable in the data frame, or an expression involving such variables. One of treatment or data2 (but not both) must be used.
data2
numerical vector or matrix, or data frame, like data. Observations in data are taken to be one sample, and those in data2 are taken to be the other. If data2 is a matrix or data frame, it must have the same number of columns, and column names, if any, as data. One of treatment or data2 (but not both) must be used.
B
integer, number of random permutations to use. With the default value of B=999, p-values are multiples of 1/1000.
alternative
character, one of "two.sided", "greater", or "less" (may be abbreviated), indicating the type of hypotheses test to perform.
ratio
logical value, if FALSE (the default) then bootstrap the difference in statistics between the two samples; if TRUE then bootstrap the ratio.
paired
logical, if TRUE then observations are paired, and observations within each pair are randomly permuted. This is equivalent to supplying group as a vector with a different value for each pair of observations. If paired is supplied then argument group is ignored.
group
vector of length equal to the number of observations in data (or in data and data2), for further stratified sampling. Within each of the two permutation samples defined by treatment or data2, sampling is done separately for each group (determined by unique values of this vector). If data is a data frame, this may be a variable in the data frame, or an expression involving such variables.
combine
numerical, logical, or character vector, indicating which variables to use for computing combined p-values. Or this may be a list, each of whose elements indicate a set of variables to use.
combinationFunction
a function which combines p-values; see for specifications.
seed
seed for generating resampling indices; a legal seed, e.g. an integer between 0 and 1023. See .
trace
logical flag indicating whether to print messages indicating progress. The default is determined by .
save.group
save.treatment
logical flags indicating whether to return the group and treatment vectors. Default is TRUE if number of observations is <= 10000 or if data2 supplied, FALSE otherwise. If not saved these can generally be recreated by when needed if treatment was supplied, but not if data2 was supplied.
...
additional arguments to . Sampler arguments sampler, sampler.prob, and sampler.args.group are not supported.

VALUE:

An object of class permutationTest2 which inherits from permutationTest and resamp. This has components call, observed, replicates, estimate, B, n, dim.obs, parent.frame, label (only if supplied), defaultLabel, p-value, combined-p-value (only if p-values are combined), seed.start, seed.end, ratio (if ratio=TRUE), and bootstrap.objects. See for a description of most components. Components particularly relevant are:
observed
vector of length p (the number of variables in data), containing the difference in the statistic computed on each of the two samples, for the original data.
replicates
matrix of dimension B by p, containing the difference in the statistic computed on each the two samples, for each permutation.
estimate
data.frame with p rows and columns "alternative" and "p-value".
combined-p-value
vector of combined p-values, of length equal to the number of combinations requested by argument combine.
ratio
this is present only when bootstrapping the ratio between samples; in that case this is the logical value TRUE.

SIDE EFFECTS:

The function permutationTest2 causes creation of the dataset .Random.seed if it does not already exist, otherwise its value is updated.

DETAILS:

The replicates generated by permutationTest2 are conceptually equivalent to those for a call to using statistic equal to statistic(data[treatment 1]) - statistic(data[treatment 2]) . If the statistic is mean, results are equivalent to calling . Neither of these will duplicate the results from permutationTest2 exactly, however, since internally all three functions use different algorithms. If statistic is or , use , which is much faster. The results for permutationTest2 are achieved by two calls to , one for each sample, using the sampler with argument full.partition to synchronize the permutations between samples.

REFERENCES:

Pesarin, F. (2001), Multivariate Permutation Tests with Applications to Biostatistics: Nonparametric Combination Methodology, Wiley, Chichester, UK. (Describes nonparametric combination methodology.)

SEE ALSO:

, , . The latter is for comparing means of two groups.

More details on arguments, see . Note that calls , so many of the arguments are common.

Combination of p-values for multivariate statistics: , , , .

Print, summarize, plot: , , , ,

Description of a "permutationTest2" object, extract parts: , , , .

Modify a "permutationTest2" object: .

For an annotated list of functions in the package, including other high-level resampling functions, see: .

EXAMPLES:

# Three ways of doing the same thing. 
set.seed(0) 
x <- matrix(rnorm(15*3), 15) 
treatment <- rep(c(T,F), length=15) 
p2 <- permutationTest2(x, statistic = colMeans, treatment = treatment,  
                       seed = 1) 
p2 
permutationTestMeans(x, treatment = treatment, seed = 1) 
permutationTest(x, statistic = colMeans(x[tr,]) - colMeans(x[!tr,]),  
                seed = 1, args.stat = list(tr = treatment)) 
 
summary(p2) 
plot(p2) 
# two combinations 
update(p2, combine = list(1:3, 1:2)) 
 
# Paired permutation test 
x1 <- rnorm(30); x2 <- rnorm(30) 
permutationTest2(x1, data2=x2, median, paired=T) 
# Another way to do a paired permutation test, using the group argument: 
permutationTest2(x1, data2=x2, median, group = rep(1:30, 2)) 
 
# data2, group arguments 
set.seed(10) 
data1 <- data.frame(x = runif(30), g = rep(1:2, c(10, 20))) 
data2 <- data.frame(x = runif(20), g = rep(1:2, 10)) 
permutationTest2(data = data1, statistic = mean(x), data2 = data2,  
                 group = g)