Discriminant Analysis

Discriminant analysis is a multivariate technique used to classify observations based on a set of feature data. It is assumed that the feature vectors have a Gaussian distribution.

Choose Statistics __image\arrow5.gif Multivariate __image\arrow5.gif Discriminant Analysis. The dialog shown below appears.

Model page

__image\discrim1.gif

In the Discriminate Analysis dialog, the Model page has the following options:

Data Set

Choose the data set (training data) containing the feature vectors and the factor identifying group membership.

Weights

Select the column in the data set containing the weights or or enter the name of an existing weight vector or matrix. An expression can be used as well (e.g. rep(1,100)). All weights must be positive.

Frequencies

Select the column in the data set containing the frequencies or enter the name of an existing frequencies vector.

Subset Rows

Enter the row indices of the data set to subset. For example, a range is separated by a colon, 1:10, and individual row indices are separated by commas 1, 2, 3, 4:20.

Omit Rows with Missing Values

Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.

Dependent Variables

Select a column in the Data Set that is the factor that identifies group membership.

Independent Variables

Select the columns in the Data Set that are the feature variables.

Discriminant Analysis Formula

Enter a formula specifying the group variable and feature variables, with the group variable on the left of a ~ operator, and the feature variables, separated by + operators, on the right. The formula is automatically filled if the Dependent and Independent fields are filled first. If a Data Set is given, all names used in the formula should be defined as variables in the data frame.

Model

Family Select the family constructor. The choices are classical, common principle component, and canonical.

Covariance Struct. Select the covariance structure of the feature vectors. The choice of covariance structures is dependent on the chosen Family.

Group Prior Select the type of prior for group membership. The choices are proportional, uniform, or none. Comma delimited numeric values can be entered. If supplied the numeric values must be positive and sum to one.

Save Model Object

In the Save As field, enter the name for the object in which to save the results of the analysis. If an object with this name already exists, its contents are overwritten. The model object can be used in later functions such as plotting.

Results page

__image\discrim2.gif

In the Discriminant Analysis dialog, the Results page has the following options:

Printed/Graphic Results

Short Output for Discriminant Analysis

Display the estimates for the group means, covariance matrices, and model coefficients.

Long Output for Discriminant Analysis

Display the estimates for the group means, covariance matrices, model coefficients, and classification error. Compute multivariate tests for the equivalence of the means and tests for normality of the training data.

Plot for Discriminant Analysis

Generate scatter plots of the training data. For the canonical discriminant function, the canonical variates are plotted.

Saved Results for Discriminant Analysis

Save In Enter the name of the data set in which to put the predicted results, or select an existing data set from the dropdown list. If an existing data set is selected the new columns are concatenated beyond the last column. For each observation in the training data, a posterior probability of group membership is estimated. A factor column is also generated that assigns each observation to the group with the highest probability.

Plug-in Select to use the plug-in estimates

Predictive Select to use the Bayesian approach to discriminant analysis. This option is not available for the canonical discrimnant function.

Debiased Select to use a bias correction for the plug-in estimates. This option is not available for the canonical discriminant function.

Crossvalidate Select to use the leave-one-out crossvalidate estimates. This option is not available for the proportional, equal correlation, or common principal component covariance models.

Related S-Plus language functions

discrim, summary.discrim, plot.discrim, anova.discrim, multicomp.discrim