This function requires the bigdata library section to be loaded.
bd.bin(data, columns=NULL, nbins=10, replace=T, suffix="bin", methods="range", ranges=NULL, k=5000)
bdFrame
or
data.frame
.
integer
, sets the number of bins.
character
, uses one of these methods:
sturges
,
freedman
, or
scott
, or enter the number of bins as a string.
TRUE
, the new bin columns replace the existing columns.
FALSE
, the new bin columns is appended to the dataset.
replace
is
FALSE
, this is appended to the
input columns' names.
"range"
(bins defined by equal ranges) or
"count"
(bins defined by equal counts).
bdFrame
or
data.frame
of the same type as
data
containing
specified bins.
This function changes continuous columns into categorical columns.
If no columns are specified, all continuous columns are binned.
The number of bins created can be specified or calculated by several methods
Sturges, Freedman-Diaconis, or Scott methods are available).
The user can also specify where the bin boundaries are by setting
the
methods
argument to create bins
with equal ranges or equal counts.
The arguments can either be scalar, in which case, they are applied to all the columns, or, they can be vectors of values. This allows you to create different bins for different columns.
# Create 7 bins for Weight column bd.bin(fuel.frame, "Weight", nbins=7) # Create bins using all methods for setting the bin count # (notice that nbins value is matched if possible) bd.bin(fuel.frame, nbins=c("scott", "free", "11", "stur")) # Create bins for all columns using equal counts and ranges alternatively # (notice that the methods value is matched if possible) bd.bin(fuel.frame, methods=c("cou", "ran")) # Create bins for a specified set of ranges for one column bd.bin(fuel.frame, 1, ranges=list(c(1500, 3000, 3500, 5000))) # Create bins for a specified set of ranges for two columns bd.bin(fuel.frame, 1:2, ranges=list(c(1500, 3000, 3500, 5000),c(70, 100, 400)))