This function requires the bigdata library section to be loaded.
bd.aggregate(data, columns=NULL, by.columns, methods="mean", names.=NULL, sort=T)
bdFrame
or
data.frame
.
columns
.
If this is shorter than
columns
, then the values
are repeated to produce an equal length vector.
column
and
methods
.
FALSE
,
do not sort the input data by
by.columns
first.
bdFrame
or
data.frame
of the same type
as
data
.
This contains one row for each block in the input defined
by
by.columns
.
The result contains all of the columns in
by.columns
,
as well as all of the columns defined by
names.
.
Use this function to apply any of a fixed set of aggregation functions to
one or more columns. The aggregation functions are applied to multiple data blocks
within the input data, as defined by
by.columns
.
Each unique combination of values in the columns
by.columns
that appears
in the data defines one data block. Normally, these columns contain strings or
factors with a limited number of unique values, but this function works with
any column type. For example, if one of the columns in
by.columns
contains
numeric data with different values for each row, then the input data is divided
into blocks with one row each.
If
sort
is
TRUE
, then the input data
is first sorted by the columns in
by.columns
, so each of the blocks
is guaranteed to have unique values for these columns.
If
sort
is
FALSE
, then the input data
is not sorted, and the blocks are determined by scanning through the rows in order. When any of the
by.columns
values changes, this signals the
beginning of another block. If the data is already sorted, specify
sort
as
FALSE
to avoid an unnecessary sort.
Within each data block defined by
by.columns
and
sort
,
apply aggregation functions to particular data columns, as specified by
columns
,
methods
, and
names.
.
The argument
columns
specifies a set of input columns to be processed. A given column can appear more than once in this argument, to calculate
multiple aggregation functions on it.
The argument
methods
specifies, for each element of
input.columns
, the aggregation function that should be calculated for that column.
There are a fixed set of possible aggregation functions, described below.
The argument
names.
specifies the output column names used to output each of the computed aggregate function values.
If you do not specify
names.
, then default names are
created by concatenating the input column name with the aggregation function.
For example, the column name
x.mean
results from an input column name
x
and an aggregation function
"mean"
.
The possible aggregation functions that can appear in the value
of the
methods
argument are as follows:
Compute the sum of the column values.
"sum"
,
"mean"
, and so on)
are only well-defined if the input column is numeric.
If the column is non-numeric, then the computed value is undefined.
The numeric functions also handle non-missing values specially:
for example,
"mean"
computes the mean of the non-missing
values only. It computes
NA
only if all of the
column values in a block are
NA
.
## Divide fuel.frame into blocks defined by the Type column, ## and for each block compute minWeight (the minimum value ## of the Weight column) and blockSize (the number of rows ## in the block). bd.aggregate(fuel.frame, columns=c("Weight", "Weight"), by.columns="Type", methods=c("min", "count"), names=c("minWeight", "blockSize"))
## Compute the min, max, mean of each of the first four ## columns of fuel.frame, within the blocks defined by ## the Type column. The output columns names default ## to "Weight.min", "Weight.max", etc. bd.aggregate(fuel.frame, columns=rep(1:4,each=3), by.columns="Type", methods=c("min", "max", "mean"))