Normalize Data

DESCRIPTION:

Center and scale continuous variables. Typically, variables are normalized so that they follow a standard Gaussian distribution (means of 0 and standard deviations of 1). To do this, bd.normalize subtracts the mean or median, and then divides by either the range or standard deviation.

This function requires the bigdata library section to be loaded.

USAGE:

bd.normalize(data, columns=NULL, center="none",
              scale="none", k=5000)

REQUIRED ARGUMENTS:

data
input data set, a bdFrame or data.frame.

OPTIONAL ARGUMENTS:

columns
names or numbers of columns to be normalized. Defaults to all numeric columns.
center
method for centering the data Choose from none, mean, or median.
scale
method for scaling the data. Choose from none, range, or stdDev
k
an estimation coefficient used for calculating quantiles

VALUE:

an object of class "bdFrame" or "data.frame", (the same class as data).

DETAILS:

The Normalize component transforms a variable so that it has mean 0 and unit standard deviation by subtracting the variable's mean (or median), and dividing by the standard deviation.

You use the Normalize component to put all (or a group) of your variables on the same scale. This is important in clustering, for example, where Euclidean distance is computed between p-dimensional points. If some of your columns have values in the 1000s and others are between 0 and 1, the variables that are in the 1000s will dominate any distance calculations.

SEE ALSO:

.

EXAMPLES:

# Normalize Weight and Mileage so that they are of similar magnitudes
bd.normalize(fuel.frame, c("Weight", "Mileage"), center="mean", scale="std")