prcomp(x, retx=T)
TRUE
, the rotated version of the
data matrix is returned. Using
retx=FALSE
saves space
in the returned data structure.
x
defining
the first principal component, etc. This may have fewer columns
than
x
.
This is commonly called the "loadings"; it is not a rotation in the sense
often used in factor analysis.
x
; i.e., the
first column is the
nrow(x)
values for the first derived
variable, etc. This may have fewer columns than
x
.
This is returned only when
retx=TRUE
.
The analysis will work even if
nrow(x)
nrow(x)
variables will be derived, and the
returned
x
will have only
nrow(x)
columns. In general,
if any of the derived variables has zero standard deviation,
that variable is dropped from the returned result.
The estimates are made via the singular value decomposition of the input
x
.
The standard deviations are the singular values divided by one less than
the number of observations.
If
ret <- prcomp(dat)
, then
ret$x == dat %*% ret$rotation
up to numerical precision.
Principal component analysis
defines a rotation of the variables (columns) of
x
. The
first derived direction is chosen to maximize the standard
deviation of the derived variable, the second to maximize
the standard deviation among directions uncorrelated with the first, etc.
Principal component analysis is often used as a data reduction technique,
sometimes in conjunction with regression.
Typically it is advisable to scale the columns of the input before
performing the principal component analysis since a variable with large
variance relative to the others will dominate the first principal component.
Many multivariate statistics books (and some regression texts) include a
discussion of principal components. Below are a few examples:
Dillon, W. R. and Goldstein, M. (1984).
Multivariate Analysis, Methods and Applications.
Wiley, New York.
Johnson, R. A. and Wichern, D. W. (1982).
Applied Multivariate Statistical Analysis.
Prentice-Hall, Englewood Cliffs, New Jersey.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979).
Multivariate Analysis.
Academic Press, London.
# principal components of the prim4 data prim.pr <- prcomp(prim4) # plot of first and second principal components plot(prim.pr$x[,1], prim.pr$x[,2]) # variance explained by first k principal components cumsum(prim.pr$sdev^2/sum(prim.pr$sdev^2)) # scree plot barplot(prim.pr$sdev^2/sum(prim.pr$sdev^2), density=20, ylim=c(0, .8), ylab="fraction of variance explained", xlab="principal component", names=as.character(1:4))