Split Rows

DESCRIPTION:

Split a data set into two data sets according to whether each row satisfies an expression.

This function requires the bigdata library section to be loaded.

USAGE:

bd.split(data, expr, row.language=T)

REQUIRED ARGUMENTS:

data
input data set or list of data sets, bdFrame(s) or data.frame(s).
expr
an expression which is evaluated to determine whether each row should be selected or not.

The expr argument is evaluated as a S-PLUS expression if the row.language argument is F. In this case, the S-PLUS expression cannot perform any big data operations, or an error is generated.

OPTIONAL ARGUMENTS:

row.language
if TRUE, evaluate expr using the row-oriented . If this is FALSE, evaluate expr as a general S-PLUS expression.

VALUE:

A list of two data sets, where the first one contains all of the input rows that satisfy expr, and the second contains the remaining rows. The data sets are either bdFrame or data.frame objects, of the same type as data[[1]],

DETAILS:

This function determines whether each row should be output in the first or second output data set by evaluating expr in S-PLUS or the . The expression language allows expressions and operators in terms of the input column names. When using a S-PLUS expression, it should be one that works on vectors, since the columns referenced by the column names will be passed to the S-PLUS expression as vectors. The result in both cases should be a logical value.

If data specifies more than one input data set, the columns from the input data sets are referenced by adding "inN$" to the beginning of the column name. For example, the column named "Weight" in the first input data set would be referenced as "in1$Weight", and the column "X" in the third input data set would be referenced as "in3$X". If there are multiple input data sets, the second, third, etc. inputs are used to evaluate expr, but their rows are not copied to the output.

SEE ALSO:

EXAMPLES:

## Return one data set with the rows where Weight>3000,
## and a second data set with the remaining rows.
bd.split(fuel.frame, "Weight>3000")