discrim(formula, data=sys.parent(), family=Classical("homoscedastic"), weights, frequencies, na.action=na.exclude, subset, prior=c("proportional", "uniform", "none"), method=c("svd", "qr", "choleski"), singular.tol=sqrt(.Machine$double.eps), ...)
~
operator,
and the feature variables, separated
by
+
operators, on the right. If
data
is given, all names used
in the formula should be defined as variables in the data frame.
formula
.
Default is the calling frame.
This is commonly referred to as the training data for the discriminant function.
family.discrim
object.
Currently, there are three family constructors,
Classical
,
CPC
,
and
Canonical
, each with one argument,
cov.structure
, specifying the covariance structure.
For the Classical family,
cov.structure
can be
"homoscedastic" (the default),
"spherical", "group spherical", "proportional" (PCM),
"equal correlation" (ECM), or "heteroscedastic".
For the common principal component family, CPC,
the covariance structures are "proportional" and "common principal-component"
(the default). The Canonical family can have only the homoscedastic
covariance structure. Synonyms for the covariance structures are given below.
formula
.
The weights must be positive.
na.exclude
) is to
delete the observation if any missing values are found. A possible alternative
is
na.fail
, which generates an error if any missing values are found.
prior
is a numerical vector,
it must have a length equal to the number of groups
and its elements must be positive and sum to one.
discrim.family
object.
These include:
an integer specifying the maximum number of iterations in searching for the MLE covariance estimates for the CPC, ECM, and PCM covariance models.
tol
is the tolerance that determines
the dimension of the canonical variates.
discrim
representing the discriminant function.
Objects of this class have the following methods:
anova.discrim
object. If the
anova
method is applied to
a single
discrim
object that is a linear discriminant function, it will
produce multivariate statistics testing the equivalence of the means,
Wilks
lamba, Pillai
s trace, Hotelling-Lawley trace, and Roys greatest root.
If multiple
discrim
objects are given, a likelihood ratio test is computed
for each adjacent pair of objects supplied.
coef.discrim
object: a list containing elements
constants
, a vector of length g,
linear.coefficients
, a p x g matrix, and, if
type
is "quadratic",
quadratic.coefficients
, a sub-list of g matrices of dimension p x p. Here, g is the number of groups and p is the dimension of the feature vectors.
Cov.discrim
object: if
type
is "linear", a p x p matrix of variance-covariances, where p is the dimension of the feature vectors, otherwise, a list of covariance matrices, one for each group.
data.frame
is returned
containing the posterior probability of group membership for each observation
in the training data.
The
group
factor of the returned
data.frame
assigns each individual to the group of highest probability.
Formulas exist for evaluating the posterior probability of group membership for this leave-one-out
method of error rate evaluation without recomputing the discriminant function
for the
Classical
homoscedastic and heteroscedastic models
(McLachlan, 1992, p.341-344) and (Ripley, 1996, p.100) and can be evaluated in a timely manner.
The S code that evaluates these functions is based on Venables and Ripleys (1998)
lda
and
qda
functions with
CV=T
.
The spherical and group spherical covariance structure models can be computed without refitting the
discriminant function also. Discriminant functions using the remaining covariance structures must
recompute the function in order to evaluate the leave-one-out error rate and can take considerable
amount of time.
family.discrim
object. This object contains information specific to a discriminant family
such as the group's covariance structure, "homoscedastic", "spherical",
"group spherical", "proportional", "equal correlation", "heteroscedastic", "common principal-component", the type of discriminant function, "linear" or "quadratic", and the call of fitting function that was used.
mulitcomp.discrim
object. This object contains Hotelling's T squared statistics testing the pairwise difference between group means. For each significant T squared statistic, p confidence intervals for the mean differences between the two groups is computed, where p is the dimension of the feature vectors.
parameters.discrim
object.
plot
method.
discrim
object.
A
data.frame
is returned containing the posterior probability of group
membership for each observation
in the input
newdata
argument.
If
newdata
is not given, the training data is used.
The
group
factor
of the
data.frame
assigns
each individual to the group of highest probability.
Options for predictions are
method
="plug-in", "predictive", or "unbiased". The plug-in
method computes the posterior probability of group membership for each
observation, where the prior probabilities of group membership are given
by the variable
prior
, and assigns the observation to the group that has the
highest probability. This is the optimal allocation rule, or Bayes rule.
The predictive method is a Bayesian method, where the posterior density of
the mean and covariance for each group given the training data is incorporated
in the posterior probability of group membership. The unbiased method used an
unbiased estimate of the log normal density.
Further details of these methods can be found in both McLachlan (1992) and
Ripley (1996). The
predict
method is based on
the
predict
method of the
lda
and
qda
objects from the Venables
and Ripley (1997)
MASS
library.
print
method.
discrim
methods placing the results in
the single
summary.discrim
object.
The structure of a
discrim
object is a list with the following data members.
The data members of a
discrim
object should be considered private and extracted by the generic methods listed in the METHODS section.
call
that is the call to the
discrim
constructor.
family.discrim
object used to construct the discriminant function.
See the
family
input argument
and method described above.
Cov
method.
coef
method.
type
is "linear", or a list of length g of
p x p matrices, where p is the dimension of the feature vectors and g is the number of groups. They scale the feature
vectors to have unit variance and zero covariance.
type
.
covariances
.
discrim
methods.
contr.treatment
. For ordered factors,
an integer scaling is used. For example, the integer scaling (Huberty, 1994,
pp.151-155) for a variable "socioeconomic status" would have the coding 1
for "low", 2 for "middle", and 3 for "high".
The
discrim
function constructs an object of class
discrim
representing
either a linear or quadratic discriminant function for a set of feature data.
All discriminant functions fit by
discrim
assume that the feature vectors
are normally distributed. For the canonical discriminant function this is not
a necessary assumption, however, but the
predict
method and various summary
statistics for the
discrim
object assume the normal model.
A linear function is computed if the feature data covariances are assumed
to be equal among the groups (homoscedastic normal or spherical models),
otherwise a quadratic function is computed. For the
Classical
discriminant family, a hierarchy of models can be entertained
by the experimenter by specifying different covariance structures. The
anova
method for the
discrim
object computes a likelihood ratio test
between two or more discriminant functions that are based on different
covariance structures.
The most restrictive model for
discrim
is the "spherical" covariance
structure of the
Classical
discriminant family. It produces a normal
(Gaussian) linear discriminant function estimated from a single diagonal
covariance matrix that is common to all groups.
This model requires estimating only p variances, where p is the number of
feature vectors. The linear discriminant function for the "homoscedastic"
covariance structure (synonym "linear"), or equal covariance structure,
requires estimating p(p+1)/2 variance-covariances.
On the other extreme, the "heteroscedastic" covariance structure
(synonym "quadratic"), or unequal covariances, produces a normal
quadratic discriminant function. Here, gp(p+1)/2 variance-covariances must
be estimated, where g is the number of groups.
The remainder of the
Classical
models produce
normal quadratic discriminant functions
that allow some degree of similarity in the group covariances.
The "group spherical" states that each group has a different diagonal
covariance and requires estimating gp variances. The "proportional"
(synonym "PCM") model states that the covariances for each group differ by a
multiplicative constant.
Here, p(p+1)/2 variance-covariances plus g-1 (one is redundant)
proportionality constants are estimated.
Finally, the equal correlation model (synonym "ECM") states that the groups
share a common correlation structure and differ only in their variances.
This requires estimating p(p-1)/2 correlations and gp variances.
The common principal component model (Flury, 1988), synonym "CPC",
states that the groups share a common set of principal components
(principal axes),
given by a p by p matrix A,
so the group covariances are t(A)%*%diag(group characteristic roots)%*%A.
A special case of the CPC model is the proportional model (synonym "PCM"),
so that option is available for the
CPC
family.
The
parameters
method for
the
discrim
object extracts the
common principal components and the characteristic roots
if the object is constructed using the
CPC
family.
The canonical model is useful for dimension reduction.
Only the "homoscedastic" covariance structure is permitted (synonym "linear").
Here, a set of vectors is found that maximizes the ratio
of the separation of the group means to the common group variance-covariances,
where the vectors have length one
and inner products between each vector are zero.
These vectors, the canonical variates,
are the eigen vectors associated with the eigen values of solve(S,B),
where B is the between-group sum of squares matrix divided by g-1
and S is the common variance-covariance matrix.
The rank of B is d=min(g-1,p) so there will be at most d canonical variates.
The optional argument
ncan
permits the user
to limit the number of retained canonical variates.
The
parameters
method
extracts the canonical variates and the eigen values
for a
discrim
object constructed
using the
Canonical
family.
Much of
discrim
and its methods are based on the
lda
and
qda
functions and
methods of the
MASS
library developed by Venables and Ripley (1998).
Flury B. (1988).
Common Principal Component and Related Multivariate Models.
John Wiley & Sons.
Huberty, C. J. (1994).
Applied Discriminant Analysis.
John Wiley & Sons.
McLachlan, G. J. (1992).
Discriminant Analysis and Statistical Pattern Recognition.
John Wiley & Sons.
Ripley, B. D. (1996).
Pattern Recognition and Neural Networks.
Cambridge University Press.
Seber, G. A. F. (1984).
Multivariate Observations.
John Wiley & Sons.
Venables, W. N. and Ripley, B. D. (1998).
Modern Applied Statistics with S-PLUS.
Springer.
# The Iris data iris.mm <- data.frame(Species=factor(c(rep(1,50), rep(2,50), rep(3,50)), labels=dimnames(iris)[[3]]), rbind(iris[,,1], iris[,,2], iris[,,3])*10) names(iris.mm) iris.quad <- discrim(Species ~ Sepal.L. + Sepal.W. + Petal.L. + Petal.W., data=iris.mm, family=Classical("heter")) iris.quad