Apply Function to Data Blocks Defined by a Moving Window

DESCRIPTION:

Apply an arbitrary S-PLUS function to multiple data blocks defined by a moving window over the input dataset.

This function requires the bigdata library section to be loaded.

USAGE:

bd.by.window(data, window, FUN, args=NULL, offset=0,
             drop.incomplete=F, output=T)

REQUIRED ARGUMENTS:

data
input data set, a bdFrame or data.frame.
window
number of rows in each block to be processed.
FUN
a function of a single argument (a data frame). The data for each input data block is converted to a data frame and passed to this function.

The FUN argument is a S-PLUS function that is called to process a data frame. This function itself cannot perform any big data operations, or an error is generated.

OPTIONAL ARGUMENTS:

args
list of additional arguments passed to the function. If this is NULL, then the FUN function should have only one argument, the input data block. If this is a list, then the elements are passed as additional arguments to the FUN function. If the list elements have names, these must match argument names for the FUN function.
offset
an integer that specifies if successive blocks are overlapping. If this is an integer greater than zero, this is the number of rows between the beginning of one block, and the beginning of the next. If this is less than or equal to zero, it is the same as specifying offset=window, so each block directly follows the previous one. If offset is greater than window, some rows will be skipped between blocks.
drop.incomplete
if TRUE, this will only process blocks with window rows. If this is FALSE, blocks at the end of the data set will be processed, even if they have less than window rows.
output
determines whether this function collects the values computed by the FUN function. This could be set to FALSE to execute a function with side-effects.

VALUE:

If the output argument is TRUE, this function returns a bdFrame or data.frame, of the same type as data, appending the data frames output by the FUN function. If the output argument is FALSE, this function returns NULL.

DETAILS:

This function applies the S-PLUS function ( FUN) to multiple data blocks within the input data as defined by a moving window over the data rows. Each data block is converted to a data.frame, and passed to the FUN function. If one of the data blocks is too large to fit in memory, an error will occur.

SEE ALSO:

, , .

EXAMPLES:

## For each distinct block of five rows in fuel.frame,
## calculate the mean of the Weight column.
bd.by.window(fuel.frame, 5,
             function(df)
                 data.frame(meanWeight=mean(df$Weight)))
## For a moving window of five rows in fuel.frame,
## with each block adjusted by one row, including the
## short blocks at the end of the dataset, print
## the mean of the Weight column and the number of rows
## in the block.
bd.by.window(fuel.frame, 5,
             function(df)
               cat("mean=",mean(df$Weight),"nrow=",nrow(df),"\n"),
             offset=1, output=F)
## For a moving window of five rows in fuel.frame,
## if the mean of the Weight column is greater than
## the min.mean value passed in via the args argument,
## print the mean of the Weight column in the block.
bd.by.window(fuel.frame, 5,
             function(df, min.mean)
               if (mean(df$Weight)>min.mean)
                 cat("mean=",mean(df$Weight),"\n"),
             output=F, args=list(min.mean=3000))