Create New Columns

DESCRIPTION:

Create one or more columns with contents calculated from the input columns.

This function requires the bigdata library section to be loaded.

USAGE:

bd.create.columns(data, exprs, names.=character(0), types="numeric",
                  string.column.width=integer(0), row.language=T,
                  copy=T)

REQUIRED ARGUMENTS:

data
input data set or list of data sets: bdFrame(s) or data.frame(s).
exprs
strings giving expressions for calculating the values of the new columns.

The exprs argument is evaluated as a S-PLUS expression if the row.language argument is F. In this case, the S-PLUS expression cannot perform any big data operations, or an error is generated.

OPTIONAL ARGUMENTS:

names.
column names for the new columns. If this vector is shorter than exprs, generate column names of the form Col1, Col2, and so on. If a column name is the same as an input column name, this input column is replaced with the new calculated values.
types
a character vector indicating the types of the new columns. This can contain any of the values: "numeric" (double values), "factor" "character", "date" ( timeDate objects), or "logical". This argument defaults to "numeric". If this argument is shorter than exprs, its values are repeated.
string.column.width
maximum number of chars that can be stored in new character columns. Storing a longer string will truncate the string and generate a warning. This value is ignored for non-character columns. If this argument is not specified, the default value is specified by bd.options("string.column.width"). If this argument is shorter than exprs, its values are repeated.
row.language
if TRUE, evaluate exprs using the row-oriented .
If FALSE, evaluate exprs as a general S-PLUS expression.
copy
if TRUE, all of the columns from the first input are copied to the output. If a column name in the first input is the same as one of the calculated column names in names, this input column is replaced with the new calculated values. If this is FALSE, the first input columns are not copied.

VALUE:

A bdFrame or data.frame of the same type as data[[1]], with additional columns whose values are computed with exprs.

DETAILS:

This function calculates the values of one or more new columns by evaluating expressions in S-PLUS or the row-oriented . The expression language allows expressions and operators in terms of the input column names. When you use a S-PLUS expression, use one that works on vectors, because the columns referenced by the column names are passed to the S-PLUS expression as vectors.

If data specifies more than one input data set, the columns from the input data sets are referenced by adding "inN$" to the beginning of the column name. For example, the column named "Weight" in the first input data set would be referenced as "in1$Weight", and the column "X" in the third input data set would be referenced as "in3$X". If there are multiple input data sets, the output will have the same number of rows as the first input data set.

The case where types contains "logical" is a special case. A big data object cannot represent logical values directly: they are represented as numeric values where numeric NA is treated as a logical NA, zero is treated as FALSE and any other value is TRUE. If a column is created with type "logical", the numeric value is converted to a logical value. This can also be used to store values computed as logical expressions in the expression language.

bd.create.columns does not allow one calculated column to be used in the formula for another calculated column.

SEE ALSO:

EXAMPLES:

# Create a new column MinusWeight.
bd.create.columns(fuel.frame, "-Weight", "MinusWeight")
# Replace the column "Weight" with the negative of the weight.
bd.create.columns(fuel.frame, "-Weight", "Weight")
# Create a logical column with T values in heavy rows.
bd.create.columns(fuel.frame, "Weight>3000", "Heavy", "logical")