Create Categories

DESCRIPTION:

Create new categorical variables from continuous variables by splitting the numeric values into a number of bins. This is useful, for example, if you have an age column; rather than including age as a continuous variable in your models, it might be more beneficial to split it into ranges that make sense demographically (<18, 18-24, 25-35, etc.).

This function requires the bigdata library section to be loaded.

USAGE:

bd.bin(data, columns=NULL, nbins=10, replace=T,
    suffix="bin", methods="range", ranges=NULL, k=5000)

REQUIRED ARGUMENTS:

data
input data set: a bdFrame or data.frame.

OPTIONAL ARGUMENTS:

columns
names or numbers of the columns to be binned.
nbins
if integer, sets the number of bins.
if character, uses one of these methods: sturges, freedman, or scott, or enter the number of bins as a string.
replace
if TRUE, the new bin columns replace the existing columns.
if FALSE, the new bin columns is appended to the dataset.
suffix
if replace is FALSE, this is appended to the input columns' names.
methods
defines method for setting bin boundaries. Choose either "range" (bins defined by equal ranges) or "count" (bins defined by equal counts).
ranges
allows user to specify ranges for binning.
k
an estimation coefficient used for calculating quantiles.

VALUE:

A bdFrame or data.frame of the same type as data containing specified bins.

DETAILS:

This function changes continuous columns into categorical columns. If no columns are specified, all continuous columns are binned. The number of bins created can be specified or calculated by several methods Sturges, Freedman-Diaconis, or Scott methods are available). The user can also specify where the bin boundaries are by setting the methods argument to create bins with equal ranges or equal counts.

The arguments can either be scalar, in which case, they are applied to all the columns, or, they can be vectors of values. This allows you to create different bins for different columns.

EXAMPLES:

# Create 7 bins for Weight column
bd.bin(fuel.frame, "Weight", nbins=7)

# Create bins using all methods for setting the bin count
# (notice that nbins value is matched if possible)
bd.bin(fuel.frame, nbins=c("scott", "free", "11", "stur"))

# Create bins for all columns using equal counts and ranges alternatively
# (notice that the methods value is matched if possible)
bd.bin(fuel.frame, methods=c("cou", "ran"))

# Create bins for a specified set of ranges for one column
bd.bin(fuel.frame, 1, ranges=list(c(1500, 3000, 3500, 5000)))

# Create bins for a specified set of ranges for two columns
bd.bin(fuel.frame, 1:2, ranges=list(c(1500, 3000, 3500, 5000),c(70, 100, 400)))