bd.normalize
subtracts
the mean or median, and then divides by either the range or standard deviation.
This function requires the bigdata library section to be loaded.
bd.normalize(data, columns=NULL, center="none", scale="none", k=5000)
bdFrame
or
data.frame
.
none
,
mean
, or
median
.
none
,
range
, or
stdDev
"bdFrame"
or
"data.frame"
,
(the same class as
data
).
The Normalize component transforms a variable so that it has mean 0 and unit standard deviation by subtracting the variable's mean (or median), and dividing by the standard deviation.
You use the Normalize component to put all (or a group) of your variables on the same scale. This is important in clustering, for example, where Euclidean distance is computed between p-dimensional points. If some of your columns have values in the 1000s and others are between 0 and 1, the variables that are in the 1000s will dominate any distance calculations.
# Normalize Weight and Mileage so that they are of similar magnitudes bd.normalize(fuel.frame, c("Weight", "Mileage"), center="mean", scale="std")