Create a Contingency Table from Factor Data

DESCRIPTION:

Create a multiway contingency table (a cross-tabulation ) from a collection of factors.

USAGE:

crosstabs(formula, data=sys.parent(), margin=<<see below>>, 
    subset, na.action=na.fail, drop.unused.levels=T, yates=F) 

REQUIRED ARGUMENTS:

formula
a formula object with the terms, separated by + operators, on the right of the ~. Each term on the right hand side should be a factor, and will be converted to one if not. If there is a term to the left of the ~ it should be a vector of counts -- this useful for data that has already been tabulated. If the formula is omitted or is ~ . and the data argument is a data frame, then all the variables in data will be cross-tabulated.

OPTIONAL ARGUMENTS:

data
A data frame or frame number telling where the variables named in the formula (and in the subset argument) may be found. If a variable is not found by searching in the data frame or frame given by data, it is expected to be on the search list.
subset
Expression telling which subset of the rows of the data should be used in the table. It can be an expression that evaluates to a logical vector, or a vector of logical values, or a vector of row numbers or row names---in short, anything you would normally use to subscript the rows of a data frame. The variable names in the expression should be names in the same place supplied by the data argument, otherwise they will be looked for on the search list. All observations are included by default.
margin
a list of (possibly empty) vectors of integers. describing which marginal proportions to calculate (and print). The integers must be in the range 1 to the number of variables to be cross-tabulated, and repeated values within a vector are not allowed. The names of the list are the labels to put in the legend printed with the table.

Each element of the list gives a vector of dimension numbers to not sum over when computing denominators for various proportions of cell count to marginal totals. E.g., 1 means to calculate row sums and integer(0) means to calculate the grand sum. The default for a two way cross-tabulation is list("Row%"=1, "Col%"=2, "Total%"=integer(0)) and that for a one way table is list("Total%"=integer(0)). For higher dimensional cross-tabulations, the default results in printing the row and column proportions for each layer-- list("N/RowTotal" = setdiff(i, 2), "N/ColTotal" = setdiff(i, 1), "N/Total" = integer(0)) where i is 1:number.of.factors. The margin argument here is similar to that in loglin.
na.action
A function for handling missing values. If there are any missing values in the data to be cross-tabulated, the data will be put into a data frame and passed to the function given by na.action. The default is na.fail, which issues a fatal error message describing the problem. A common alternative is na.exclude, which deletes cases with NAs in any of the variables to be cross-tabulated. na.include will add the level NA to each factor before cross-tabulating them (the formula may also include terms like na.include(x) to do this only for certain variables).
drop.unused.levels
If TRUE (the default) then any unused levels in factors will be omitted from the table. If FALSE, they will not be dropped and the table will contain rows or columns of zeros for those unused levels. This will cause the marginal proportions for those levels and the overall chi-squared statistic to be NA's, but may be useful for making parallel tables of similar data sets.
yates
If FALSE (the default) not not apply Yates' continuity correction when computing the chi-squared statistic. If TRUE do apply the correction. This is passed on to the printing function for crosstabs, print.crosstabs, and has no effect if no printing is done.

VALUE:

An object of class crosstabs. This is an array of counts, suitable for use in functions like loglin. It also has an attribute marginals, a list of arrays of the marginal proportions specified by the margin argument. (These arrays are stacked by the print method for crosstabs so that corresponding entries lie near each other.) It also may have an attribute na.message, giving a message that the na.action function sometimes gives when it deals with missing values in the data (e.g., na.exclude will supply a na.message telling how many cases were ignored).

DETAILS:

This function provides a convenient interface to the table and tapply functions, for tabulation (counting the number of observations that fall in each cell in a contingency table). If you want to do other calculations, say, say compute means or sums for observations in cells, try tapply.

NOTE:

The printing method, print.crosstabs, will generally add row and column totals for each 2 dimensional layer of the table and will compute an overall chi squared statistic to test independence of all the variables in the table. If you want to omit them you may by calling print.crosstabs directly.

BUGS:

The formula could be used to describe the marginal proportions and tests to perform but does not yet. Hence all terms should be addends in the formula.

SEE ALSO:

, , , , , , .

EXAMPLES:

crosstabs(~Solder+Opening, data=solder, subset=skips>10) 
# Produces the following output: 
# Call: 
# crosstabs( ~ Solder + Opening, data = solder, subset = skips > 10) 
# 158 cases in table 
# +----------+ 
# |N         | 
# |N/RowTotal| 
# |N/ColTotal| 
# |N/Total   | 
# +----------+ 
# Solder |Opening 
#        |S      |M      |L      |RowTotl| 
# -------+-------+-------+-------+-------+ 
# Thin   |99     |15     | 9     |123    | 
#        |0.805  |0.122  |0.073  |0.78   | 
#        |0.805  |0.577  |1.000  |       | 
#        |0.627  |0.095  |0.057  |       | 
# -------+-------+-------+-------+-------+ 
# Thick  |24     |11     | 0     |35     | 
#        |0.686  |0.314  |0.000  |0.22   | 
#        |0.195  |0.423  |0.000  |       | 
#        |0.152  |0.070  |0.000  |       | 
# -------+-------+-------+-------+-------+ 
# ColTotl|123    |26     |9      |158    | 
#        |0.778  |0.165  |0.057  |       | 
# -------+-------+-------+-------+-------+ 
# Test for independence of all factors 
#         Chi^2 = 9.18309 d.f.= 2 (p=0.01013719) 
#         Yates' correction not used 
#         Some expected values are less than 5, don't trust stated p-value 

# Example 2 
petfood <- data.frame(Pet=c("Dog","Dog","Cat","Cat","Cat"), 
                    Food=c("Wet","Wet","Dry","Wet",NA)) 
crosstabs(data=petfood, na.action=na.exclude) 
# Produces the following output: 
# Call: 
# crosstabs(data = petfood, na.action = na.exclude) 
# 4 cases in table 
# Dropping 1 cases because of missing values 
# +----------+ 
# |N         | 
# |N/RowTotal| 
# |N/ColTotal| 
# |N/Total   | 
# +----------+ 
# Pet    |Food 
#        |Dry    |Wet    |RowTotl| 
# -------+-------+-------+-------+ 
# Cat    |1      |1      |2      | 
#        |0.50   |0.50   |0.5    | 
#        |1.00   |0.33   |       | 
#        |0.25   |0.25   |       | 
# -------+-------+-------+-------+ 
# Dog    |0      |2      |2      | 
#        |0.00   |1.00   |0.5    | 
#        |0.00   |0.67   |       | 
#        |0.00   |0.50   |       | 
# -------+-------+-------+-------+ 
# ColTotl|1      |3      |4      | 
#        |0.25   |0.75   |       | 
# -------+-------+-------+-------+ 
# Test for independence of all factors 
#         Chi^2 = 1.333333 d.f.= 1 (p=0.2482131) 
#         Yates' correction not used 
#         Some expected values are less than 5, don't trust stated p-value