discr(x, k)
x
must be ordered by groups. The rows represent observations and the columns represent variables. Missing values are not accepted.
k
can be a vector of group sizes: the first
k[1]
rows form group 1, the next
k[2]
rows form group 2, etc. Missing values are not accepted.
ncol(x)
containing linear combinations of variables. The columns of
vars
are the linear combinations of the input variables: i.e.,
x %*% vars
produces the matrix of discriminant variables.
groups
and the ith discriminant variable is the ith element of
cor
. The ith contrast of
groups
is represented for the data by a vector of length
nrow(x)
with values from
group[,i]
.
The
discr
function implements linear discriminant analysis.
This technique was devised by R.A. Fisher as a "sensible" way of distinguishing between groups.
The first discriminant function (represented by
vars[,1]
) is the linear function of the variables that maximizes the ratio of the between-group sum of squares to the within-group sum of squares.
The second discriminant function is the linear combination that is uncorrelated with (but not necessarily orthogonal to) the first, and has the same optimality criterion.
The third and later discriminant functions are defined analogously.
The greatest possible number of linear discriminant functions is
min(ncol(x), k-1)
where
k
is the number of groups.
This type of analysis is optimal if the groups are all distributed multivariate normals and have the same variance matrix.
An observation can be classified by computing its Euclidean distance from the group centroids, projected onto a subspace defined by a subset of the canonical variates. The observation is then assigned to the closest group. The
cor
component of the output is a measure of the ability to discriminate between the groups.
In general, discriminant analysis is the process of dividing up
ncol(x)
dimensional space into
k
pieces so that the groups are as distinct as possible. While cluster analysis seeks to divide the observations into groups, discriminant analysis presumes the groups are known and seeks to understand what makes them different.
Multivariate planing can be used as an exploratory tool to see if (non-linear) discrimination is possible. See the help file for
mstree
for more details.
Discriminant analysis is discussed in many multivariate statistics books,
among which are:
Dillon, W. R. and Goldstein, M. (1984). Multivariate Analysis, Methods and Applications. New York: Wiley.
Gnanadesikan, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. New York: Wiley.
Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979). Multivariate Analysis. London: Academic Press.
# discrimination using a grouping variable discr.group <- function(x, group) { size <- table(category(sort(group))) discr(x[order(group),], size) } # a discrimination analysis of the iris data iris.var <- rbind(iris[,,1], iris[,,2], iris[,,3]) iris.dis <- discr(iris.var, 3) iris.dv <- iris.var %*% iris.dis$vars brush(cbind(iris.dv, rep(1:3, c(50, 50, 50)))) iris.x <- iris.dv[,1] iris.y <- iris.dv[,2] iris.lab <- c(rep("S",50), rep("C",50), rep("V",50)) plot(iris.x, iris.y, type = "n", xlab = "first discriminant variable", ylab = "second discriminant variable") text(iris.x, iris.y, iris.lab)