cov.mve
with a
formula
Object
mve
containing robust estimates
of the covariance matrix,
the location of the data, and optionally the robust correlation matrix.
Specifically, the
cov.mve.formula
function returns
weighted estimates, with weights based on
the minimum volume ellipsoid estimator proposed by Rousseeuw (1985).
This is a method for the function
cov.mve
for formula objects.
cov.mve.formula(formula, data=<<see below>>, weights, subset=<<see below>>, na.action=na.fail, model=F, x=F, cor=F, print=T, popsize=<<see below>>, mutate.prob=c(0.15,0.2,0.2,0.2), random.n=<<see below>>, births.n=<<see below>>, stock=list(), maxslen=<<see below>>, stockprob=<<see below>>, nkeep=1, nsamp==<<see below>>)
subset
argument.
If this is missing, then the variables in the formula should be on the
search list.
This may also be a single number to handle some special cases -- see
below for details.
cov.mve.formula
does not allow input
weights
.
model.frame
after
any
subset
argument has been used.
The default (with
na.fail
) is to create an error
if any missing values are found.
A possible alternative is
na.exclude
, which deletes observations
that contain one or more missing values.
TRUE
, the model frame is returned in component
model
.
TRUE
, the model matrix is returned in component
x
.
TRUE
, then the estimated correlation matrix will be
returned as well.
TRUE
, a message about the number of samples taken and the
number of those samples that were singular will be printed.
(100*p)+(20*p^2)
, where
p
is the
number of variables.
stock
component of the output
of a previous call to the function.
p+1
if
(n-p)/2
is less than
p+1
, where
n
is the number
of observations, and it is the minimum of
trunc((n-p)/2)
and
5*p
otherwise.
i
th element corresponds to the individual with the
i
th lowest
objective.
The default is
cumsum((2 * (popsize:1))/popsize/(popsize + 1))
.
nsamp
is
always
popsize + births.n + random.n - length(stock)
. The default
value is the result of the right hand side of the above equation.
"mve"
representing the minimum volume ellipsoid
covariance estimation.
See the
mve.object
help file for details.
.Random.seed
if
it does not already exist, otherwise its value is updated.
print
is
TRUE
, then a message is printed.
The
formula
argument is passed around unevaluated;
that is, the variables mentioned in the formula will be defined when
the model frame is computed, not when
cov.mve.formula
is initially called.
In particular, if
data
is given, all these names should generally
be defined as variables in that data frame.
The
subset
argument, like the terms in
formula
, is evaluated in the context
of the data frame, if present.
The specific action of the argument is as follows: the model frame,
including subset, is computed on all the rows,
and then the appropriate subset is extracted.
A variety of special cases make such an interpretation
desirable (e.g., the use of
lag
or other functions that may need
more than the data used in the computation to be fully defined).
On the other hand, if you meant the subset to avoid computing
undefined values or to escape warning messages, you may be surprised.
For example,
cov.mve(~ log(x), mydata, subset = x > 0)
will still generate warnings from
log
.
If this is a problem, do the subsetting on the data frame directly:
cov.mve(~ log(x), mydata[,mydata$x > 0])
cov.mve.default
is called when the model frame has been computed.
See the
cov.mve.default
help file for details on
the computational algorithm.
NAMES.
Variables occurring in a formula are evaluated differently from
arguments to S-PLUS functions, because the formula is an object
that is passed around unevaluated from one function to another.
The functions such as
cov.mve.formula
that finally arrange to
evaluate the variables in the formula try to establish a context
based on the
data
argument.
More precisely, the function
model.frame.default
does the
actual evaluation, assuming that its caller behaves in
the way described here.
If the
data
argument to
cov.mve.formula
is missing or is an object (typically, a data frame),
then the local context for
variable names is the frame of the function that called
cov.mve.formula
,
or the top-level expression frame if the user called
cov.mve.formula
directly.
Names in the formula can refer to variables in the local
context as well as global variables or variables in the
data
object.
The
data
argument can also be a number, in which case that number defines
the local context.
This can arise, for example, if a function is written to call
cov.mve.formula
, perhaps in a loop,
but the local context is definitely not that function.
In this case, the function can set
data
to
sys.parent()
, and the local
context will be the next function up the calling stack.
See the second example below.
A numeric value for
data
can also be supplied if a local context
is being explicitly created by a call to
new.frame
.
Notice that supplying
data
as a number implies that this is the only
local context; local variables in any other function will not be
available when the model frame is evaluated.
This is potentially subtle.
Fortunately, it
is not something the ordinary user of
cov.mve.formula
needs to worry
about.
It is relevant for those writing functions that call
cov.mve.formula
.
Burns, P. J. (1992). A genetic algorithm for robust regression estimation.
(StatSci Technical Note).
Lopuhaa, H. P. and Rousseeuw, P. J. (1991).
Breakdown points of affine equivariant estimators of multivariate location and
covariance matrices.
Annals of Statistics,
19, 229-248.
Rousseeuw, P. J. (1985).
Multivariate estimation with high breakdown point.
In
Mathematical Statistics and Applications.
W. Grossmann, G. Pflug, I. Vincze and W. Wertz, eds.
Reidel: Dordrecht, 283-297.
Rousseeuw, P. J. (1991).
A diagnostic plot for regression outliers and leverage points.
Computational Statistics and Data Analysis,
11, 127-129.
Rousseeuw, P. J. and van Zomeren, B. C. (1990).
Unmasking multivariate outliers and leverage points (with discussion).
Journal of the American Statistical Association,
85, 633-651.
Woodruff, D. L. and Rocke, D. M. (1993).
Heuristic search algorithms for the minimum volume ellipsoid estimator.
Journal of Computational and Graphical Statistics,
2, 69-95.
cov.mve(~wind+radiation+temperature, data=air) # mymve calls cov.mve, using the caller to mymve # as the local context for variables in the formula # (see aov for an actual example) mymve <- function(formula, data = sys.parent(), ...) { .. .. mve <- cov.mve(formula, data, ...) .. .. }