bd.normalize subtracts
the mean or median, and then divides by either the range or standard deviation.
This function requires the bigdata library section to be loaded.
bd.normalize(data, columns=NULL, center="none",
scale="none", k=5000)
bdFrame
or
data.frame.
none,
mean, or
median.
none,
range, or
stdDev
"bdFrame" or
"data.frame",
(the same class as
data).
The Normalize component transforms a variable so that it has mean 0 and unit standard deviation by subtracting the variable's mean (or median), and dividing by the standard deviation.
You use the Normalize component to put all (or a group) of your variables on the same scale. This is important in clustering, for example, where Euclidean distance is computed between p-dimensional points. If some of your columns have values in the 1000s and others are between 0 and 1, the variables that are in the 1000s will dominate any distance calculations.
# Normalize Weight and Mileage so that they are of similar magnitudes
bd.normalize(fuel.frame, c("Weight", "Mileage"), center="mean", scale="std")