cov.mcd
on a vector, matrix, or data frame.
mcd
containing estimates of the robust
multivariate location, the robust covariance matrix, and optionally
the robust correlation matrix.
Specifically, the
cov.mcd.default
function first returns the raw
minimum covariance determinant (MCD) estimator of Rousseeuw (1984, 1985).
Then the MCD estimate is used to assign weights to the objects,
and also weighted estimates of location and covariance are returned.
This is the default method for the function
cov.mcd
.
cov.mcd.default(data, cor=F, print=T, quan=<<see below>>, ntrial=<<see below>>)
NA
s) and Infinite values (
Inf
s) are allowed.
Observations (rows) with missing or infinite values are automatically
excluded from the computations.
TRUE
, then the estimated correlation matrix will be
returned as well.
TRUE
, information about the method will be printed.
quan
is
floor((n+p+1)/2)
, where
n
is the number of observations and
p
is the number of variables. Any
quan
between the default and
n
may
be specified.
500
.
"mcd"
representing the minimum covariance
determinant estimates.
See
for details.
print
is
TRUE
, then a message is printed.
Let
n
be the number of observations and
p
be the number of variables.
The minimum covariance determinant estimate is given by the subset of
quan
observations of which the determinant of their covariance matrix is minimal.
The MCD location estimate is then the mean of those
quan
points, and the MCD
scatter estimate is their covariance matrix. The default value of
quan
is
floor((n+p+1)/2)
, but the user may choose a larger number.
For multivariate data sets it takes too much time to find the
exact estimate, so an approximation is computed.
A full description of the present algorithm can be found in
Rousseeuw and Van Driessen (1997).
Major advantages of this algorithm are its precision and the fact
that it can deal with very large
n
.
Although the raw minimum covariance determinant estimate has a high
breakdown value, its statistical efficiency is low. A better finite-sample
efficiency can be attained while retaining the high breakdown value by
computing a weighted mean and a covariance estimate, with weights based on
the MCD estimate.
By default,
cov.mcd
returns both the raw MCD estimate
and the weighted estimate.
Multivariate outliers can be found by means of the robust distances,
as described
in Rousseeuw and Leroy (1987) and in Rousseeuw and Van Zomeren (1990). These
distances can be calculated by the function
mahalanobis
, and
plotted by applying
plot.mcd
on a
"mcd"
object.
It is suggested that the number of observations be at least five times the
number of variables.
When there are fewer observations than this, there is not enough information
to determine whether outliers exist.
An important advantage of the present algorithm is that it allows for
exact fit situations, where more than
quan
observations lie on a
hyperplane. Then the program still yields the MCD location and
scatter matrix, the latter being singular (as it should be), as well
as the equation of the hyperplane.
If the classical covariance matrix of the data is already singular, all
observations lie on a hyperplane. Then
cov.mcd
will give a message and
the equation of the hyperplane. The MCD estimates are then equal to the
classical estimates.
In this case, you will need to modify your data before applying
cov.mcd
,
perhaps by using
princomp
and deleting columns with zero variance.
For univariate data sets an exact algorithm, described in (Rousseeuw and Leroy 1987) is used.
The minimum covariance determinant estimator (Rousseeuw 1985)
has a breakdown value of roughly
(n-quan)/n
, which is about 50% for the
default
quan
.
That is, the estimate cannot be made arbitrarily bad without changing about
half of the data.
A covariance matrix is considered to be arbitrarily bad if some eigenvalue
goes to infinity or to zero (singular matrix).
This is analogous to a univariate scale estimate, which breaks down if the
estimate is going
either to infinity or to zero.
Rousseeuw, P.J. (1984).
Least Median of Squares Regression.
Journal of the American Statistical Association,
79, 871-881.
Rousseeuw, P.J. (1985).
Multivariate estimation with high breakdown point.
In: Mathematical Statistics and Applications
W. Grossmann, G. Pflug, I. Vincze and W. Wertz, eds.
Reidel: Dordrecht, 283-297.
Rousseeuw, P. J. and Leroy, A. M. (1987).
Robust Regression and Outlier Detection.
Wiley-Interscience, New York. [Chapter 7]
Rousseeuw, P. J. and van Zomeren, B. C. (1990).
Unmasking multivariate outliers and leverage points.
Journal of the American Statistical Association,
85, 633-639.
Rousseeuw, P.J. and Van Driessen, K. (1997).
A Fast Algorithm for the Minimum Covariance Determinant Estimator,
submitted for publication.
fr.cov <- cov.mcd(freeny.x) cov.mcd(freeny.x)