Estimate a Discriminant Function

DESCRIPTION:

Fit a normal (Gaussian) linear or quadratic discriminant function to a set of feature data.

USAGE:

discrim(formula, data=sys.parent(), family=Classical("homoscedastic"),
        weights, frequencies,  na.action=na.exclude, subset,
        prior=c("proportional", "uniform", "none"),
        method=c("svd", "qr", "choleski"),
        singular.tol=sqrt(.Machine$double.eps), ...)

REQUIRED ARGUMENTS:

formula
a formula object, specifying the group variable and feature variables, with the group variable on the left of a ~ operator, and the feature variables, separated by + operators, on the right. If data is given, all names used in the formula should be defined as variables in the data frame.

OPTIONAL ARGUMENTS:

data
a data frame in which to interpret the variables named in formula. Default is the calling frame. This is commonly referred to as the training data for the discriminant function.
family
a family.discrim object. Currently, there are three family constructors, Classical, CPC, and Canonical, each with one argument, cov.structure, specifying the covariance structure. For the Classical family, cov.structure can be "homoscedastic" (the default), "spherical", "group spherical", "proportional" (PCM), "equal correlation" (ECM), or "heteroscedastic". For the common principal component family, CPC, the covariance structures are "proportional" and "common principal-component" (the default). The Canonical family can have only the homoscedastic covariance structure. Synonyms for the covariance structures are given below.
weights
vector of observation weights. If supplied, the weighted mean and covariances are computed for the feature variables specified in the formula. The weights must be positive.
frequencies
a vector of observation frequencies.
na.action
a function to filter missing data. This is applied to the model frame after any subset argument has been used. The default (with na.exclude) is to delete the observation if any missing values are found. A possible alternative is na.fail, which generates an error if any missing values are found.
subset
expression specifying a row subset of the data to be used in the fit (training data). This can be a logical vector, or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.
prior
a character string or numerical vector specifying the prior knowledge of the mixing proportions of each group. The acceptable strings are as follows: "proportional", group proportions are the number of observations from each group divided by the total number of observations; "uniform", group proportions are one over the number of groups; "none", exclude the mixing proportion from the discriminant function. If prior is a numerical vector, it must have a length equal to the number of groups and its elements must be positive and sum to one.
method
numerical method to used to decompose the feature matrix or covariances. The choices are "svd", singular value decomposition (the default), "qr", QR decomposition, or "choleski", Choleski decomposition.
singular.tol
tolerance for determining existing linear dependencies among the feature vectors.
...
additional arguments to pass on the fitting functions defined in the discrim.family object. These include:
max.iter

an integer specifying the maximum number of iterations in searching for the MLE covariance estimates for the CPC, ECM, and PCM covariance models.

tol
floating point variable specifying the tolerance for the MLE covariance estimates for the CPC, ECM, and the PCM covariance models. For the canonical discriminant function tol is the tolerance that determines the dimension of the canonical variates.

VALUE:

an object of class discrim representing the discriminant function.

METHODS:

Objects of this class have the following methods:

ARGUMENTS:

anova
returns an anova.discrim object. If the anova method is applied to a single discrim object that is a linear discriminant function, it will produce multivariate statistics testing the equivalence of the means, Wilks lamba, Pillais trace, Hotelling-Lawley trace, and Roys greatest root. If multiple discrim objects are given, a likelihood ratio test is computed for each adjacent pair of objects supplied.
coef
returns a coef.discrim object: a list containing elements constants, a vector of length g, linear.coefficients, a p x g matrix, and, if type is "quadratic", quadratic.coefficients, a sub-list of g matrices of dimension p x p. Here, g is the number of groups and p is the dimension of the feature vectors.
Cov
returns a Cov.discrim object: if type is "linear", a p x p matrix of variance-covariances, where p is the dimension of the feature vectors, otherwise, a list of covariance matrices, one for each group.
crossvalidate
perform cross-validation on the discriminant function. A data.frame is returned containing the posterior probability of group membership for each observation in the training data. The group factor of the returned data.frame assigns each individual to the group of highest probability. Formulas exist for evaluating the posterior probability of group membership for this leave-one-out method of error rate evaluation without recomputing the discriminant function for the Classical homoscedastic and heteroscedastic models (McLachlan, 1992, p.341-344) and (Ripley, 1996, p.100) and can be evaluated in a timely manner. The S code that evaluates these functions is based on Venables and Ripleys (1998) lda and qda functions with CV=T. The spherical and group spherical covariance structure models can be computed without refitting the discriminant function also. Discriminant functions using the remaining covariance structures must recompute the function in order to evaluate the leave-one-out error rate and can take considerable amount of time.
family
returns the family.discrim object. This object contains information specific to a discriminant family such as the group's covariance structure, "homoscedastic", "spherical", "group spherical", "proportional", "equal correlation", "heteroscedastic", "common principal-component", the type of discriminant function, "linear" or "quadratic", and the call of fitting function that was used.
multicomp
returns a mulitcomp.discrim object. This object contains Hotelling's T squared statistics testing the pairwise difference between group means. For each significant T squared statistic, p confidence intervals for the mean differences between the two groups is computed, where p is the dimension of the feature vectors.
parameters
extracts the parameters of the discriminant function, returning a parameters.discrim object.
plot
the plot method.
predict
the prediction method for the discrim object. A data.frame is returned containing the posterior probability of group membership for each observation in the input newdata argument. If newdata is not given, the training data is used. The group factor of the data.frame assigns each individual to the group of highest probability. Options for predictions are method="plug-in", "predictive", or "unbiased". The plug-in method computes the posterior probability of group membership for each observation, where the prior probabilities of group membership are given by the variable prior, and assigns the observation to the group that has the highest probability. This is the optimal allocation rule, or Bayes rule. The predictive method is a Bayesian method, where the posterior density of the mean and covariance for each group given the training data is incorporated in the posterior probability of group membership. The unbiased method used an unbiased estimate of the log normal density. Further details of these methods can be found in both McLachlan (1992) and Ripley (1996). The predict method is based on the predict method of the lda and qda objects from the Venables and Ripley (1997) MASS library.
print
the print method.
summary
the method for generation summary statistics for the discriminant function. This method implements many of the discrim methods placing the results in the single summary.discrim object.

STRUCTURE:

The structure of a discrim object is a list with the following data members. The data members of a discrim object should be considered private and extracted by the generic methods listed in the METHODS section.

ARGUMENTS:

call
An object of class call that is the call to the discrim constructor.
family
the family.discrim object used to construct the discriminant function. See the family input argument and method described above.
counts
vector containing the number of observations (sum of the frequencies) from each group.
swf
vector of the sum of the weights (times the frequencies) for each group.
np
the number of parameters estimated.
means
a g x p matrix of the group means, where g is the number of groups and p is the number of feature vectors.
covariances
the group covariances. See the Cov method.
coefficients
the discriminant function coefficients. See the coef method.
scaling
either a p x p matrix, if type is "linear", or a list of length g of p x p matrices, where p is the dimension of the feature vectors and g is the number of groups. They scale the feature vectors to have unit variance and zero covariance.
log.determinants
the log determinant of the covariance matrix, or matrices, depending on the value of type.
likelihood
the Wishart log likelihood of the group corrected sum of squares crossproducts matrices where the Wishart covariance parameters are equal to the fitted covariances.
aux
auxiliary variable used to cache information needed by discrim methods.
prior
vector of the mixing proportions of the groups.
variables
a list containing information on the feature variables. If any of the feature variables are factors, then this data structure contains, among other things, the factor coding information (the contrasts functions used in linear models to apply various restrictions on the fit). By default, factors are given the 0-1 coding assigned by contr.treatment. For ordered factors, an integer scaling is used. For example, the integer scaling (Huberty, 1994, pp.151-155) for a variable "socioeconomic status" would have the coding 1 for "low", 2 for "middle", and 3 for "high".

DETAILS:

The discrim function constructs an object of class discrim representing either a linear or quadratic discriminant function for a set of feature data. All discriminant functions fit by discrim assume that the feature vectors are normally distributed. For the canonical discriminant function this is not a necessary assumption, however, but the predict method and various summary statistics for the discrim object assume the normal model. A linear function is computed if the feature data covariances are assumed to be equal among the groups (homoscedastic normal or spherical models), otherwise a quadratic function is computed. For the Classical discriminant family, a hierarchy of models can be entertained by the experimenter by specifying different covariance structures. The anova method for the discrim object computes a likelihood ratio test between two or more discriminant functions that are based on different covariance structures.

The most restrictive model for discrim is the "spherical" covariance structure of the Classical discriminant family. It produces a normal (Gaussian) linear discriminant function estimated from a single diagonal covariance matrix that is common to all groups. This model requires estimating only p variances, where p is the number of feature vectors. The linear discriminant function for the "homoscedastic" covariance structure (synonym "linear"), or equal covariance structure, requires estimating p(p+1)/2 variance-covariances. On the other extreme, the "heteroscedastic" covariance structure (synonym "quadratic"), or unequal covariances, produces a normal quadratic discriminant function. Here, gp(p+1)/2 variance-covariances must be estimated, where g is the number of groups. The remainder of the Classical models produce normal quadratic discriminant functions that allow some degree of similarity in the group covariances.

The "group spherical" states that each group has a different diagonal covariance and requires estimating gp variances. The "proportional" (synonym "PCM") model states that the covariances for each group differ by a multiplicative constant. Here, p(p+1)/2 variance-covariances plus g-1 (one is redundant) proportionality constants are estimated. Finally, the equal correlation model (synonym "ECM") states that the groups share a common correlation structure and differ only in their variances. This requires estimating p(p-1)/2 correlations and gp variances.

The common principal component model (Flury, 1988), synonym "CPC", states that the groups share a common set of principal components (principal axes), given by a p by p matrix A, so the group covariances are t(A)%*%diag(group characteristic roots)%*%A. A special case of the CPC model is the proportional model (synonym "PCM"), so that option is available for the CPC family. The parameters method for the discrim object extracts the common principal components and the characteristic roots if the object is constructed using the CPC family.

The canonical model is useful for dimension reduction. Only the "homoscedastic" covariance structure is permitted (synonym "linear"). Here, a set of vectors is found that maximizes the ratio of the separation of the group means to the common group variance-covariances, where the vectors have length one and inner products between each vector are zero. These vectors, the canonical variates, are the eigen vectors associated with the eigen values of solve(S,B), where B is the between-group sum of squares matrix divided by g-1 and S is the common variance-covariance matrix. The rank of B is d=min(g-1,p) so there will be at most d canonical variates. The optional argument ncan permits the user to limit the number of retained canonical variates. The parameters method extracts the canonical variates and the eigen values for a discrim object constructed using the Canonical family.

Much of discrim and its methods are based on the lda and qda functions and methods of the MASS library developed by Venables and Ripley (1998).

REFERENCES :

Flury B. (1988). Common Principal Component and Related Multivariate Models. John Wiley & Sons.

Huberty, C. J. (1994). Applied Discriminant Analysis. John Wiley & Sons.

McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press.

Seber, G. A. F. (1984). Multivariate Observations. John Wiley & Sons.

Venables, W. N. and Ripley, B. D. (1998). Modern Applied Statistics with S-PLUS. Springer.

SEE ALSO:

, , , , .

EXAMPLES:

# The Iris data
iris.mm <- data.frame(Species=factor(c(rep(1,50), rep(2,50), rep(3,50)),
    labels=dimnames(iris)[[3]]), rbind(iris[,,1], iris[,,2], iris[,,3])*10)
names(iris.mm)
iris.quad <- discrim(Species ~ Sepal.L. + Sepal.W. + Petal.L. + Petal.W.,
    data=iris.mm, family=Classical("heter"))
iris.quad