good
part of the data.
cov.rob(x, cor=FALSE, quantile.used=floor((n + p + 1)/2), method=c("mve", "mcd", "classical"), nsamp="best", seed)
good
points.
"best"
or
"exact"
or
"sample"
. If
"sample"
the number chosen is
min(5*p, 3000)
, taken from Rousseeuw
and Hubert (1997). If
"best"
exhaustive enumeration is done up to
5000 samples: if
"exact"
exhaustive enumeration will be attempted
however many samples are needed.
.Random.seed
. The
current value of
.Random.seed
will be preserved if it is set..
cor=T
) the estimate of the correlation
matrix.
quantile.used
.
good
points.
For method
"mve"
, an approximate search is made of a subset of
size
quantile.used
with an enclosing ellipsoid of smallest volume; in
method
"mcd"
it is the volume of the Gaussian confidence
ellipsoid, equivalently the determinant of the classical covariance
matrix, that is minimized. The mean of the subset provides a first
estimate of the location, and the rescaled covariance matrix a first
estimate of scatter. The Mahalanobis distances of all the points from
the location estimate for this covariance matrix are calculated, and
those points within the 97.5% point under Gaussian assumptions are
declared to be
good
. The final estimates are the mean and rescaled
covariance of the
good
points.
The rescaling is by the appropriate percentile under Gaussian data; in
addition the first covariance matrix has an ad hoc finite-sample
correction given by Marazzi.
For method
"mve"
the search is made over ellipsoids determined
by the covariance matrix of
p
of the data points. For method
"mcd"
an additional improvement step suggested by Rousseeuw and
van Driessen (1997) is used, in which once a subset of size
quantile.used
is selected, an ellipsoid based on its covariance
is tested (as this will have no larger a determinant, and may be smaller).
P. J. Rousseeuw and A. M. Leroy (1987)
Robust Regression and Outlier Detection.
Wiley.
A. Marazzi (1993)
Algorithms, Routines and S Functions for Robust Statistics.
Wadsworth & Brooks/Cole.
P. J. Rousseeuw and B. C. van Zomeren (1990) Unmasking
multivariate outliers and leverage points,
Journal of the American Statistical Association,
85, 633-639.
P. J. Rousseeuw and K. van Driessen (1999) A fast algorithm for the
minimum covariance determinant estimator.
Technometrics
41, 212-223.
stackloss <- data.frame(stack.x, stack.loss) cov.rob(stackloss) cov.rob(stack.x, method="mcd", nsamp="exact")