This function requires the bigdata library section to be loaded.
bd.create.columns(data, exprs, names.=character(0), types="numeric", string.column.width=integer(0), row.language=T, copy=T)
bdFrame
(s)
or
data.frame
(s).
exprs
argument is evaluated as a
S-PLUS expression if the
row.language
argument
is
F
. In this case, the S-PLUS expression cannot
perform any big data operations, or an error is generated.
exprs
,
generate column names of the form
Col1
,
Col2
,
and so on.
If a column name is the same as an input column name,
this input column is replaced with the new calculated values.
"numeric"
(double values),
"factor"
"character"
,
"date"
(
timeDate
objects),
or
"logical"
.
This argument defaults to
"numeric"
.
If this argument is shorter than
exprs
,
its values are repeated.
bd.options("string.column.width")
.
If this argument is shorter than
exprs
,
its values are repeated.
TRUE
, evaluate
exprs
using the row-oriented
.
FALSE
, evaluate
exprs
as a general S-PLUS expression.
TRUE
,
all of the columns from the first input are copied to the output.
If a column name in the first input is the same as one of the calculated
column names in
names
,
this input column is replaced with the new calculated values.
If this is
FALSE
, the first input columns are not copied.
bdFrame
or
data.frame
of the same type as
data[[1]]
,
with additional columns whose values are computed with
exprs
.
This function calculates the values of one or more new columns by evaluating expressions in S-PLUS or the row-oriented . The expression language allows expressions and operators in terms of the input column names. When you use a S-PLUS expression, use one that works on vectors, because the columns referenced by the column names are passed to the S-PLUS expression as vectors.
If
data
specifies more than one input data set,
the columns from the input data sets are referenced by adding
"inN$"
to the beginning of the column name.
For example, the column named
"Weight"
in the first
input data set would be referenced as
"in1$Weight"
,
and the column
"X"
in the third input
data set would be referenced as
"in3$X"
.
If there are multiple input data sets,
the output will have the same number of rows as the first input data set.
The case where
types
contains
"logical"
is
a special case.
A big data object cannot represent logical values directly:
they are represented as numeric values
where numeric
NA
is treated as a logical
NA
,
zero is treated as
FALSE
and any other value is
TRUE
.
If a column is created with type
"logical"
,
the numeric value is converted to a logical value.
This can also be used to store values computed as
logical expressions in the expression language.
bd.create.columns
does not allow one calculated column to be used in the
formula for another calculated column.
# Create a new column MinusWeight. bd.create.columns(fuel.frame, "-Weight", "MinusWeight")
# Replace the column "Weight" with the negative of the weight. bd.create.columns(fuel.frame, "-Weight", "Weight")
# Create a logical column with T values in heavy rows. bd.create.columns(fuel.frame, "Weight>3000", "Heavy", "logical")