This function requires the bigdata library section to be loaded.
bd.by.window(data, window, FUN, args=NULL, offset=0, drop.incomplete=F, output=T)
bdFrame
or
data.frame
.
FUN
argument is a S-PLUS function
that is called to process a data frame. This function itself cannot
perform any big data operations, or an error is generated.
NULL
, then the
FUN
function should have only one argument,
the input data block.
If this is a list, then the elements are passed as additional arguments
to the
FUN
function.
If the list elements have names,
these must match argument names for the
FUN
function.
offset=window
, so each block directly follows
the previous one.
If
offset
is greater than
window
,
some rows will be skipped between blocks.
TRUE
, this will only process blocks with
window
rows.
If this is
FALSE
, blocks at the end of the data set will be
processed, even if they have less than
window
rows.
FUN
function.
This could be set to
FALSE
to execute a function
with side-effects.
output
argument is
TRUE
,
this function returns a
bdFrame
or
data.frame
,
of the same type as
data
, appending the data frames output
by the
FUN
function.
If the
output
argument is
FALSE
,
this function returns
NULL
.
This function applies the S-PLUS function (
FUN
) to
multiple data blocks within the input data as defined
by a moving window over the data rows.
Each data block is converted to a
data.frame
, and passed
to the
FUN
function.
If one of the data blocks is too large to fit in memory, an error will occur.
## For each distinct block of five rows in fuel.frame, ## calculate the mean of the Weight column. bd.by.window(fuel.frame, 5, function(df) data.frame(meanWeight=mean(df$Weight)))
## For a moving window of five rows in fuel.frame, ## with each block adjusted by one row, including the ## short blocks at the end of the dataset, print ## the mean of the Weight column and the number of rows ## in the block. bd.by.window(fuel.frame, 5, function(df) cat("mean=",mean(df$Weight),"nrow=",nrow(df),"\n"), offset=1, output=F)
## For a moving window of five rows in fuel.frame, ## if the mean of the Weight column is greater than ## the min.mean value passed in via the args argument, ## print the mean of the Weight column in the block. bd.by.window(fuel.frame, 5, function(df, min.mean) if (mean(df$Weight)>min.mean) cat("mean=",mean(df$Weight),"\n"), output=F, args=list(min.mean=3000))