Split a Dataset by Factors and Apply a Function to the Parts

DESCRIPTION:

by.data.frame takes a data frame and a list of indices, each of which should have one entry for each row (observation) in the data frame. For each unique combination of values in the factors it extracts the rows in the data frame whose corresponding indices have that combination of values and calls the function of your choice with those rows of the data frame as its argument.

USAGE:

by(x, INDICES, FUN, ...) 

REQUIRED ARGUMENTS:

x
A data frame or a bdFrame. Currently any x will be converted to a data.frame or a bdFrame, but in the future there may be special methods for various classes of data.
INDICES
A factor or list of several factors. The length of each factor should be the same as the number of rows of x. The elements of the categories define the position in a multi-way array corresponding to each x observation. Missing values (NAs) are allowed. The names of INDICES are used as the names of the dimnames of the result. If a vector is given, it will be treated as a list with one unnamed component.
FUN
A function whose first argument is a data frame. FUN will be called once for each row subset of x determined by INDICES.

OPTIONAL ARGUMENTS:

...
All other arguments will be passed to FUN each time it is called.

VALUE:

An object of class "by" is returned. This consists of an array of mode "list" with one dimension for each index in INDICES, the dimension being the number of levels in that index. The dimnames of the object give the levels of the indices and the names of the dimnames give the names of the indices. If the list given as INDICES has no names then by() will try to make up some reasonable names. If there are no observations corresponding to some elements of the array, those elements will have the value NULL ( FUN will not be called for those empty cells).

This object is intended to be printed by print.by, the print method of objects of class by. For each cell in the array it prints the value of each index then prints the value of the cell. It prints a separator line, a series of dashes by default, between the cells.

DETAILS:

by() is a convenient, object oriented version of tapply().

SEE ALSO:

, , .

EXAMPLES:

by(kyphosis, kyphosis$Kyphosis, summary) 
# Gives the following output: 
# kyphosis$Kyphosis:absent 
#    Kyphosis         Age             Number         Start 
#  absent :64   Min.   :  1.00   Min.   :2.00   Min.   : 1.00 
#  present: 0   1st Qu.: 18.00   1st Qu.:3.00   1st Qu.:11.00 
#               Median : 79.00   Median :4.00   Median :14.00 
#               Mean   : 79.89   Mean   :3.75   Mean   :12.61 
#               3rd Qu.:131.00   3rd Qu.:5.00   3rd Qu.:16.00 
#               Max.   :206.00   Max.   :9.00   Max.   :18.00 
# ------------------------------------------------------------ 
# kyphosis$Kyphosis:present 
#    Kyphosis         Age             Number           Start 
#  absent : 0   Min.   : 15.00   Min.   : 3.000   Min.   : 1.000 
#  present:17   1st Qu.: 73.00   1st Qu.: 4.000   1st Qu.: 5.000 
#               Median :105.00   Median : 5.000   Median : 6.000 
#               Mean   : 97.82   Mean   : 5.176   Mean   : 7.294 
#               3rd Qu.:128.00   3rd Qu.: 6.000   3rd Qu.:12.000 
#               Max.   :157.00   Max.   :10.000   Max.   :14.000 
by(kyphosis, list(Kyphosis=kyphosis$Kyphosis, Older=kyphosis$Age>105), 
       function(data) lm(Number ~ Start, data=data)) 
# Gives the following output: 
# Kyphosis:absent 
# Older:FALSE 
# Call: 
# lm(formula = Number ~ Start, data = data) 
# Coefficients: 
#  (Intercept)       Start 
#     4.885736 -0.08764492 
# Degrees of freedom: 39 total; 37 residual 
# Residual standard error: 1.261852 
# ------------------------------------------------------------ 
# Kyphosis:present 
# Older:FALSE 
# Call: 
# lm(formula = Number ~ Start, data = data) 
# Coefficients: 
#  (Intercept)      Start 
#     6.371257 -0.1191617 
# Degrees of freedom: 9 total; 7 residual 
# Residual standard error: 1.170313 
# ------------------------------------------------------------ 
# Kyphosis:absent 
# Older:TRUE 
#  ...