This function requires the bigdata library section to be loaded.
bd.split.by.group(data, by.columns, sort=T, bigdata=is(x,"bdFrame"))
bdFrame
or
data.frame
.
FALSE
, do not sort the input data by
by.columns
first.
TRUE
, returns a list of
bdFrame
objects.
If
FALSE
, this returns a list of
data.frame
objects.
The default uses the type of
x
to determine which type of objects to return.
by.columns
.
If the argument
bigdata
is
TRUE
,
the list elements will be
bdFrame
objects;
otherwise, they will be
data.frame
objects.
The returned list has element names constructed from the contents of the
by.column
values for each block.
This function divides the input data into blocks defined by the columns
by.columns
,
and returns a list of all of these blocks.
If
bigdata
is
FALSE
, the output list elements will be
data.frame
objects.
In this case, if all of the data is too large to fit in memory, an error will occur.
Each unique combination of values in the columns
by.columns
that appears
in the data defines one data block. Normally, these columns contain strings or
factors with a limited number of unique values, but this function works with
any column type. For example, if one of the columns in
by.columns
contains
numeric data with different values for each row, then the input data will be divided
into blocks with one one row each.
If
sort
is
TRUE
, the input data
is first sorted by the columns in
by.columns
, so each of the blocks
is guaranteed to have unique values for these columns.
If
sort
is
FALSE
, the input data
is not sorted, and the blocks are determined by scanning through the rows in order.
When any of the
by.columns
values changes,
this signals the beginning of another block.
Specify
sort
as
FALSE
when the data is already sorted to avoid an unnecessary sort.
## Divide the data into a list of blocks, ## divided by the values of "Type" bd.split.by.group(fuel.frame, "Type")