Filter Rows

DESCRIPTION:

Filter out rows that do or do not satisfy an expression.

This function requires the bigdata library section to be loaded.

USAGE:

bd.filter.rows(data, expr, columns=NULL,
               include=T, row.language=T)

REQUIRED ARGUMENTS:

data
The input data set or list of data sets, bdFrame(s) or data.frame(s).
expr
An expression, which is evaluated to determine whether each row should be selected.

The expr argument is evaluated as a S-PLUS expression if the row.language argument is F. In this case, the S-PLUS expression cannot perform any big data operations, or an error is generated.

OPTIONAL ARGUMENTS:

columns
names or numbers of the input data set columns to be output for the selected rows. If NULL, this specifies all of the data set columns. If there are multiple input data sets, this only refers to the first one.
include
If TRUE, then only the selected rows are included in the output. If FALSE, then the selected rows are excluded from the output.
row.language
If TRUE, evaluate filter.expr using the row-oriented . If FALSE, evaluate expr as a general S-PLUS expression.

VALUE:

A bdFrame or data.frame of the same type as data[[1]], with rows selected by expr included or excluded.

DETAILS:

This function determines whether each row should be selected by evaluating expr as a general S-PLUS expression or using the row-oriented the . The expression language allows expressions and operators in terms of the input column names. When you use a S-PLUS expression, it should be one that works on vectors, because the columns referenced by the column names are passed to the S-PLUS expression as vectors. The result in both cases should be a logical value.

If data specifies more than one input data set, then the columns from the input data sets are referenced by adding "inN$" to the beginning of the column name. For example, the column named "Weight" in the first input data set would be referenced as "in1$Weight", and the column "X" in the third input data set would be referenced as "in3$X". If there are multiple input data sets, then the second, third, and so on inputs are used to evaluate expr, but their rows are not copied to the output.

SEE ALSO:

EXAMPLES:

## Get only those rows where Weight>3000.
bd.filter.rows(fuel.frame, "Weight>3000")
## Remove rows where Type=="Van".
bd.filter.rows(fuel.frame, "Type=='Van'", include=F)