bdPrincomp
containing the standard deviations of the principal components, the
loadings, and, optionally, the scores.
This function requires the bigdata library section to be loaded.
bdPrincomp(x, data=NULL, covlist=NULL, scores=T, cor=F, na.action, subset)
x
or
data
must be given.
bdFrame
or formula.
If a
bdFrame
, the columns should correspond to variables and the rows to
observations.
If a formula, no variables may appear on the left (response) side.
bdFrame
. Usually, this is used only when
x
is a formula, although it might be used
instead of
x
.
bdPrincomp
. It is in the function
signature for consistency with
princomp
,
but the function will stop with an error message if it is not
NULL
.
scores
is
TRUE
, then a
bdFrame
of the scores for
all of the components is returned. If
scores
is
FALSE
, then no scores are computed.
TRUE
, then the principal components
are based on the correlation matrix rather than the covariance matrix.
That is, the variables are scaled to have unit variance.
"bdPrincomp"
which is
a list with components:
"loadings"
giving the loadings. The first column is the linear combination of
columns of
x
defining the first
principal component, etc.
cor
is
FALSE
, these are all 1. If
cor
is
TRUE
,
scales
is the standard
deviations of the input data variables.
bdPrincomp
.
bdModel
object used by
predict.bdPrincomp
to compute predictions
on new data.
bdFrame
containing principal component scores for the data.
The results of
princomp
and
bdPrincomp
agree to double-precision
accuracy with one exception: The signs of the loadings are not
determined uniquely in principal components analysis; therefore, they might
differ.
Principal component analysis
defines a rotation of the variables of
x
. The
first derived direction (a linear combination of the variables)
is chosen to maximize the standard
deviation of the derived variable, the second to maximize
the standard deviation among directions uncorrelated with the first, and so on.
Principal component analysis is often used as a data reduction technique,
sometimes in conjunction with regression.
If the variables are not all in the same units,
you should scale the columns of the input before
performing the principal component analysis because a variable with large
variance relative to the others will dominate the first principal component.
Many multivariate statistics books (and some regression texts) include a
discussion of principal components. Below are a few examples:
Dillon, W. R. and Goldstein, M. (1984).
Multivariate Analysis, Methods and Applications.
Wiley, New York.
Johnson, R. A. and Wichern, D. W. (1982).
Applied Multivariate Statistical Analysis.
Prentice-Hall, Englewood Cliffs, New Jersey.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979).
Multivariate Analysis.
Academic Press, London.
x <- princomp(as.bdFrame(state.x77))