This function requires the bigdata library section to be loaded.
bd.remove.missing(data, columns, methods="drop",
replacement.values=0,
key.columns=character(0))
bdFrame,
or a
data.frame.
all is used.
none (no change),
dropRows or
drop
(drop rows where this column contains a missing value),
generateFromDistribution or
distribution
(replace NA with a value selected from distribution),
replaceWithMean
or
mean (replace NA with mean),
replaceWithConstant or
constant (replace NA with a value from
replacement.values),
lastObservation or
last (replace NA with the last value from
the row with the same value in the column given by
key.columns).
replaceWithConstant method.
lastObservation method.
These should be factor columns.
bdFrame or
data.frame,
of the same type as
x.
The Missing Values component supports five different methods for dealing with missing values in your data set:
Drop RowsGenerate from DistributionReplace with MeanReplace with ConstantLast Observation Carried Forwardkey.columns argument.
If the key column is not given or is an empty string, then this option replaces a
missing value with the last non-missing value in the same column.
## Drop Rows
bd.remove.missing(data.frame(c(1:10, NA)), methods="dropRows")
bd.remove.missing(data.frame(c(1:10, NA)), methods="drop")
## Replace with constant
bd.remove.missing(data.frame(c(1:10, NA)), methods="replaceWithConstant", replacement.values="2")
bd.remove.missing(data.frame(c("A","B", NA)), methods="constant", replacement.values="MissingData")
## Replace with generated value
bd.remove.missing(data.frame(c(1:10, NA)), methods="generateFromDistribution")
bd.remove.missing(data.frame(c("A","B", NA)), methods="dist")