Fisher's Exact Test for Count Data

DESCRIPTION:

Performs a Fisher's exact test on a two-dimensional contingency table.

USAGE:

fisher.test(x, y=NULL, node.stack.dim=1001, value.stack.dim=10000,
            hybrid=F) 

REQUIRED ARGUMENTS:

x
either a factor or category object or a two-dimensional contingency table in matrix form. If x is a matrix, each dimension must be no less than 2 and no greater than 10, all elements must be non-negative, and NAs, and Infs are not allowed. The elements of matrix x should be whole numbers, as the test is based on counts; the storage mode of x will be coerced to "integer". For restrictions on x when it is a factor or a category object, see argument y.

OPTIONAL ARGUMENTS:

y
factor or category object. If x is a matrix, y is ignored. If x is a factor or a category object, y is required and must have the same length as x. Each object must have no less than 2 and no greater than 10 levels. NAs in the category index vectors are allowed, but pairs (x[i],y[i]) containing these will be removed. Each element of the index vectors of x and y should give the membership of that observation in one of the groups present in the levels attributes; an NA in an index vector means that the observation is not in one of the groups listed for that factor or category object. Infs have no meaning as indices, and should not be present.

Conversely, if x and y are present, and either x or y is not a factor or category object (and x is not a matrix), it will be coerced to one implicitly. In this case pairs (x[i],y[i]) containing NAs will be removed, but not pairs with Infs. Coercion of x and y in this manner is intended for datasets of mode numeric, whose elements are typically small integers; data in the form of character vectors should first be made into factor or category objects.
node.stack.dim
dimension of a stack for storing nodes corresponding to possible subtables.
value.stack.dim
dimension of a stack for storing different function values corresponding to nodes.
hybrid
logical flag: if TRUE, a hybrid algorithm is used. This involves an approximation. See Mehta and Patel (1986).

VALUE:

a list of class "htest", containing the following components:

p.value
the p-value for the test.
alternative
always "two.sided".
method
character string giving the name of the method used.
data.name
a character string (vector of length 1) containing the actual name of the input argument x, and of y if both are factor or category objects.

NULL HYPOTHESIS:

Fisher's exact test is typically used to test the null hypothesis of independence between the row and column variables of the table. Certain types of homogeneity, for example homogeneity of proportions in a k by 2 table, are equivalent to the independence hypothesis. See the literature references for examples.

TEST ASSUMPTIONS:

Unlike many tests for categorical data whose test statistics have an asymptotic known distribution, Fisher's exact test does not require the cell counts to be large. Since the test proceeds by conditioning on the marginal totals, however, it is important that this have a meaningful interpretation relative to the sampling scheme governing data collection.

DETAILS:



The algorithm used in fisher.test is based on theory from Mehta and Patel (1983, 1986) and Joe (1985, 1988). It involves a network algorithm together with matrix majorization results to find the maximum and minimum of a certain objective function at each node in the network that is processed. See Joe (1988).

WARNING:

The total number of counts in the cross-classification table cannot be greater than 200.

REFERENCES:

(a) Statistical Theory

Bishop, Y. M. M., Fienberg, S. J., and Holland, P. W. (1980). Discrete Multivariate Analysis: Theory and Practice, Cambridge, Mass.: The MIT Press.

Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley.

Zar, J. H. (1984). Biostatistical Analysis, 2nd ed. Englewood Cliffs: Prentice-Hall.

(b) Computer Algorithm

Joe, H. (1985). An Ordering of Dependence for Contingency Tables. Linear Algebra and its Applications 70, 89-103.

Joe, H. (1988). Extreme probabilities for contingency tables under row and column independence with application to Fisher's exact test. Communications in Statistics A, Theory and Methods 17, 3677-3685.

Mehta, C. R. and Patel, N. R. (1983). A network algorithm for performing Fisher's exact test in r*c contingency tables. Journal of the American Statistical Association 78, 427-434.

Mehta, C. R. and Patel, N. R. (1986). Algorithm 643. FEXACT: A Fortran subroutine for Fisher's exact test on unordered r*c contingency tables. ACM Transactions on Mathematical Software 12, 154-161.

Mehta, C. R. and Patel, N. R. (1986). A hybrid algorithm for Fisher's exact test in unordered r*c contingency tables. Communications in Statistics A, Theory and Methods 15, 387-404.

SEE ALSO:

, , , , , .

EXAMPLES:

x <- factor(c(1,1,2,1,2,1,1,2,2), labels=c("A", "Abar"))
y <- factor(c(1,1,1,2,1,2,2,1,1), labels=c("B", "Bbar"))
  
table(x,y)    # table from Fleiss, p. 25 

fisher.test(x,y) 
  
fisher.test(table(x,y))       # same thing