by.data.frame
takes a data frame and a list of indices, each of
which should have one entry for each row (observation) in the data frame.
For each unique combination of values in the factors it extracts
the rows in the data frame whose corresponding indices have that
combination of values and calls the function of your choice with
those rows of the data frame as its argument.
by(x, INDICES, FUN, ...)
bdFrame
. Currently any
x
will be converted to a
data.frame
or a
bdFrame
, but
in the future there may be special methods for various classes
of data.
x
.
The elements of the
categories define the position in a multi-way array
corresponding to each
x
observation. Missing values (NAs)
are allowed. The names of
INDICES
are used as the names
of the dimnames of the result. If a vector is given, it
will be treated as a list with one unnamed component.
FUN
will
be called once for each row subset of
x
determined by
INDICES
.
FUN
each time it is called.
INDICES
,
the dimension being the number of levels in that index.
The dimnames of the object give the levels of the
indices and the names of the dimnames give the names of the indices.
If the list given as
INDICES
has no names then by() will try
to make up some reasonable names.
If there are no observations corresponding to some elements of
the array, those elements will have the value NULL (
FUN
will
not be called for those empty cells).
print.by
, the print
method of objects of class
by
. For each cell in the array
it prints the value of each index then prints the value of the
cell. It prints a separator line, a series of dashes by default,
between the cells.
by()
is a convenient, object oriented version of
tapply()
.
by(kyphosis, kyphosis$Kyphosis, summary) # Gives the following output: # kyphosis$Kyphosis:absent # Kyphosis Age Number Start # absent :64 Min. : 1.00 Min. :2.00 Min. : 1.00 # present: 0 1st Qu.: 18.00 1st Qu.:3.00 1st Qu.:11.00 # Median : 79.00 Median :4.00 Median :14.00 # Mean : 79.89 Mean :3.75 Mean :12.61 # 3rd Qu.:131.00 3rd Qu.:5.00 3rd Qu.:16.00 # Max. :206.00 Max. :9.00 Max. :18.00 # ------------------------------------------------------------ # kyphosis$Kyphosis:present # Kyphosis Age Number Start # absent : 0 Min. : 15.00 Min. : 3.000 Min. : 1.000 # present:17 1st Qu.: 73.00 1st Qu.: 4.000 1st Qu.: 5.000 # Median :105.00 Median : 5.000 Median : 6.000 # Mean : 97.82 Mean : 5.176 Mean : 7.294 # 3rd Qu.:128.00 3rd Qu.: 6.000 3rd Qu.:12.000 # Max. :157.00 Max. :10.000 Max. :14.000 by(kyphosis, list(Kyphosis=kyphosis$Kyphosis, Older=kyphosis$Age>105), function(data) lm(Number ~ Start, data=data)) # Gives the following output: # Kyphosis:absent # Older:FALSE # Call: # lm(formula = Number ~ Start, data = data) # Coefficients: # (Intercept) Start # 4.885736 -0.08764492 # Degrees of freedom: 39 total; 37 residual # Residual standard error: 1.261852 # ------------------------------------------------------------ # Kyphosis:present # Older:FALSE # Call: # lm(formula = Number ~ Start, data = data) # Coefficients: # (Intercept) Start # 6.371257 -0.1191617 # Degrees of freedom: 9 total; 7 residual # Residual standard error: 1.170313 # ------------------------------------------------------------ # Kyphosis:absent # Older:TRUE # ...