Multiple Discriminant Analysis

DESCRIPTION:

Finds the linear discriminant function to distinguish between a number of groups.

Note: This function is deprecated; please use instead.

USAGE:

discr(x, k) 

REQUIRED ARGUMENTS:

x
matrix of data. The rows of x must be ordered by groups. The rows represent observations and the columns represent variables. Missing values are not accepted.
k
the number of groups if the groups are equal in size. Alternatively, k can be a vector of group sizes: the first k[1] rows form group 1, the next k[2] rows form group 2, etc. Missing values are not accepted.

VALUE:

a list describing the discriminant analysis, with the following components:
cor
vector of discriminant correlations (correlations between linear combinations of the variables and linear combinations of the groups).
vars
square matrix of size ncol(x) containing linear combinations of variables. The columns of vars are the linear combinations of the input variables: i.e., x %*% vars produces the matrix of discriminant variables.
groups
square matrix containing linear combinations of predicted groups. The size of the matrix is equivalent to the number of groups. All of the columns except the last are contrasts, so that the correlation between the ith contrast of groups and the ith discriminant variable is the ith element of cor. The ith contrast of groups is represented for the data by a vector of length nrow(x) with values from group[,i].

DETAILS:

The discr function implements linear discriminant analysis. This technique was devised by R.A. Fisher as a "sensible" way of distinguishing between groups. The first discriminant function (represented by vars[,1]) is the linear function of the variables that maximizes the ratio of the between-group sum of squares to the within-group sum of squares. The second discriminant function is the linear combination that is uncorrelated with (but not necessarily orthogonal to) the first, and has the same optimality criterion. The third and later discriminant functions are defined analogously. The greatest possible number of linear discriminant functions is min(ncol(x), k-1) where k is the number of groups. This type of analysis is optimal if the groups are all distributed multivariate normals and have the same variance matrix.

An observation can be classified by computing its Euclidean distance from the group centroids, projected onto a subspace defined by a subset of the canonical variates. The observation is then assigned to the closest group. The cor component of the output is a measure of the ability to discriminate between the groups.

BACKGROUND:

In general, discriminant analysis is the process of dividing up ncol(x) dimensional space into k pieces so that the groups are as distinct as possible. While cluster analysis seeks to divide the observations into groups, discriminant analysis presumes the groups are known and seeks to understand what makes them different.

Multivariate planing can be used as an exploratory tool to see if (non-linear) discrimination is possible. See the help file for mstree for more details.

REFERENCES:

Discriminant analysis is discussed in many multivariate statistics books, among which are:

Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis, Methods and Applications. New York: Wiley.

Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. New York: Wiley.

Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979). Multivariate Analysis. London: Academic Press.

SEE ALSO:

, , , , .

EXAMPLES:

# discrimination using a grouping variable 
discr.group <- function(x, group) { 
      size <- table(category(sort(group))) 
      discr(x[order(group),], size) 
} 

# a discrimination analysis of the iris data 
iris.var <- rbind(iris[,,1], iris[,,2], iris[,,3]) 
iris.dis <- discr(iris.var, 3) 
iris.dv <- iris.var %*% iris.dis$vars 
brush(cbind(iris.dv, rep(1:3, c(50, 50, 50)))) 

iris.x <- iris.dv[,1] 
iris.y <- iris.dv[,2] 
iris.lab <- c(rep("S",50), rep("C",50), rep("V",50)) 
plot(iris.x, iris.y, type = "n", xlab = "first discriminant variable", 
     ylab = "second discriminant variable") 
text(iris.x, iris.y, iris.lab)