Principal Components

For investigations involving a large number of observed variables, it is often useful to simplify the analysis by considering a smaller number of linear combinations of the original variables. For example, scholastic achievement tests typically consist of a number of examinations in different subject areas. In attempting to rate students applying for admission, college administrators frequently attempt to reduce the scores from all subject areas to a single, overall score. Principal components is a standard technique for finding optimal linear combinations of the variables.

To perform principal components analysis:

Choose Statistics __image\ebd_ebd77.gif Multivariate __image\ebd_ebd78.gif Principal Components. The dialog shown below appears.

Model page

__image\princom1.gif

In the Principal Components Analysis dialog, the Model page has the following options:

Data

Data (Principal Components)

In the Data group you have the choice of using a data set and then specifying a formula, or using a covariance list to perform the principal components analysis. The appropriate fields become enabled depending upon your choice.

Subset Rows

Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.

Omit Rows with Missing Values

Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.

Use Covariance List as Input

Select this to use a covariance list as model input, instead of a data set. Selecting this enables the Covariance List field. Selecting this enables the Covariance List field and makes the other Data fields and Formula fields unavailable.

Covariance List

Enter the name of a covariance list to be used as alternative model input. This list must have the form of a list returned by cov.wt and cov.mve. Components must include center and cov. A cor component is not used; however, an n.obs component is used if present.

Formula

Formula (Principal Components)

Variables Choose several variables to include in the principal components analysis.

Formula The Formula edit field is automatically filled using the variables selected from the Variables dropdown list. There is no response variable for principal component analysis; the formula shows the selected variables additively, following a tilde (~). The formula field may be edited directly.

Model Scaling

Select either Covariance (unscaled) or Correlation (scaled to have unit variance) to define the scaling on which the computation of principal components is based on. The default is Covariance

Results page

__image\princom2.gif

In the Principal Components Analysis dialog, the Results page has the following options:

Short Output for Principal Components Analysis

Select this to print a summary of the model results in the designated output window. Printed results include sums of squares of the component loadings, the size of the data, the names of the components in the fitted model object, and the call that created the model object.

Component Importance

Select this to include the importance of each factor in the printed results.

Loadings

Select this to include the loadings matrix with the printed results.

Loading Options

In the Cutoff Loading Value field enter a number giving the cutoff for printing the loadings. Elements of the loadings matrix whose absolute value is smaller than the cutoff value appear as blanks. This field is only enabled when Loadings is selected.

Plot page

__image\princom3.gif

In the Principal Components Analysis dialog, the Plot page has the following options:

Screeplot

Select this to produce a barplot of eigenvalues for each principal component.

Biplot

Select this to produce a biplot between two factors of the fitted model (Factor Analysis) or of the component loadings (Principal Analysis). The biplot shows the relation of the factors to both the original variables and the original data. This field is enabled only when the number of factors to be fitted is greater than one.

Biplot Options

Biplot Which Scores Enter the two factors or components to be plotted in the form c(factor1, factor2). By default, a biplot of the first two factors is created. This field is enabled only when Biplot is selected.

Predict page

__image\princom4.gif

In the Principal Components Analysis dialog, the Predict page has the following options:

New Data

Enter the name of a matrix or data set to use for computing predictions. It must contain the same names as the terms in the right side of the formula for the model. If omitted, the original data are used for computing predictions.

Save In

Enter the name of a data set in which a part of the analysis, such as fitted values and residuals, predictions, confidence intervals, or standard errors, is saved.

Predictions

Select this to save predictions to the data set specified in Save In.

Related S-Plus language functions for Principal Components Analysis

princomp, princomp.object, loadings, biplot.princomp, screeplot, plot.loadings,

Other related S-Plus language functions

svd, cancor, factanal