This function requires the bigdata library section to be loaded.
bd.remove.missing(data, columns, methods="drop", replacement.values=0, key.columns=character(0))
bdFrame
,
or a
data.frame
.
all
is used.
none
(no change),
dropRows
or
drop
(drop rows where this column contains a missing value),
generateFromDistribution
or
distribution
(replace NA with a value selected from distribution),
replaceWithMean
or
mean
(replace NA with mean),
replaceWithConstant
or
constant
(replace NA with a value from
replacement.values
),
lastObservation
or
last
(replace NA with the last value from
the row with the same value in the column given by
key.columns
).
replaceWithConstant
method.
lastObservation
method.
These should be factor columns.
bdFrame
or
data.frame
,
of the same type as
x
.
The Missing Values component supports five different methods for dealing with missing values in your data set:
Drop Rows
Generate from Distribution
Replace with Mean
Replace with Constant
Last Observation Carried Forward
key.columns
argument.
If the key column is not given or is an empty string, then this option replaces a
missing value with the last non-missing value in the same column.
## Drop Rows bd.remove.missing(data.frame(c(1:10, NA)), methods="dropRows") bd.remove.missing(data.frame(c(1:10, NA)), methods="drop") ## Replace with constant bd.remove.missing(data.frame(c(1:10, NA)), methods="replaceWithConstant", replacement.values="2") bd.remove.missing(data.frame(c("A","B", NA)), methods="constant", replacement.values="MissingData") ## Replace with generated value bd.remove.missing(data.frame(c(1:10, NA)), methods="generateFromDistribution") bd.remove.missing(data.frame(c("A","B", NA)), methods="dist")