ltsreg with a
formula Object
ltsreg for
formula objects.
ltsreg.formula(formula, data=<<see below>>, weights, subset=<<see below>>,
na.action=na.fail, model=F, x=F, y=F, quan=<<see below>>,
singular.ok=F, qr.out=F, wt=T, mcd=T, popsize=<<see below>>,
mutate.prob=c(0.15, 0.2, 0.2, 0.2), random.n=<<see below>>,
births.n=<<see below>>, stock=list(), maxslen=<<see below>>,
stockprob=<<see below>>, nkeep=1)
~ operator,
and the terms (separated by
+ operators) on the right.
subset and the
weights argument.
If this is missing, then the variables in the formula should be on the
search list.
This may also be a single number to handle some special cases -- see
below for details.
model.frame after
any
subset argument has been used.
The default (with
na.fail) is to create an error
if any missing values are found.
A possible alternative is
na.exclude, which deletes observations
that contain one or more missing values.
TRUE, the model frame is returned in component
model.
TRUE, the model matrix is returned in component
x.
TRUE, the response is returned in component
y.
max(floor((n+p+1)/2), floor(0.9*n)),
where
n is the number of observations
and
p is the rank
of
x.
In general,
quan must be an integer between
the default value and
n.
FALSE,
then an error is created if
x
(plus the intercept) is found to be singular.
TRUE, then a list representing the QR decomposition
of
x is returned.
ltsreg be returned?
These weights can be used in
lsfit or
lm to obtain a weighted
least squares solution.
TRUE,
cov.mcd will be called on
x.
The results are needed in
plot.lts for the diagnostic plot
of robust residuals versus robust
x-distances.
(50*p)+(15*p^2),
where
p is the rank of
x (including the intercept, if any).
This default allows reasonably accurate estimation for
p at least up
to twenty. You may consider doubling this if you want to ensure very
accurate minimization of the objective.
p if
(n-p)/2 is less than
p, where
n is
the number of observations, and it is the minimum of
trunc((n-p)/2)
and
5*p otherwise.
cumsum((2*(popsize:1))/popsize/(popsize+1)).
"lts" giving the solution.
See the
lts.object help file for details.
The
formula argument is passed around
unevaluated,
that is, the variables mentioned in the formula will be defined when
the model frame is computed, not when
ltsreg is initially called.
In particular, if
data is given, all these names should generally
be defined as variables in that data frame.
The
subset argument, like the terms in
formula, is evaluated in the context
of the data frame, if present.
The specific action of the argument is as follows: the model frame,
including weights and subset, is computed on all the rows,
and then the appropriate subset is extracted.
A variety of special cases make such an interpretation
desirable (e.g., the use of
lag or other functions that may need
more than the data used in the fit to be fully defined).
On the other hand, if you meant the subset to avoid computing
undefined values or to escape warning messages, you may be surprised.
For example,
ltsreg(y ~ log(x), mydata, subset = x > 0)
will still generate warnings from
log. If this is a problem, do
the subsetting on the data frame directly:
ltsreg(y ~ log(x), mydata[,mydata$x > 0])
ltsreg.default is called when the model frame has been computed.
See the
ltsreg.default help file for details on the computational
algorithm.
NAMES.
Variables occurring in a formula are evaluated differently from
arguments to S-PLUS functions, because the formula is an object
that is passed around unevaluated from one function to another.
The functions such as
ltsreg.formula that finally arrange to evaluate the
variables in the formula try to establish a context based on the
data
argument.
(More precisely, the function
model.frame.default does the
actual evaluation, assuming that its caller behaves in
the way described here.)
If the
data argument to
ltsreg.formula
is missing or is an object (typically, a data frame),
then the local context for
variable names is the frame of the function that called
ltsreg.formula, or the
top-level expression frame if the user called
ltsreg.formula directly.
Names in the formula can refer to variables in the local context as well
as global variables or variables in the
data object.
The
data argument can also be a number, in which case that number defines
the local context.
This can arise, for example, if a function is written to call
ltsreg.formula,
perhaps in a loop, but the local context is definitely not that function.
In this case, the function can set
data to
sys.parent(), and the local
context will be the next function up the calling stack.
See the second example below.
A numeric value for
data can also be supplied if a local context
is being explicitly created by a call to
new.frame.
Notice that supplying
data as a number
implies that this is the only local context;
local variables in any other function will not be
available when the model frame is evaluated.
This is potentially subtle.
Fortunately, it
is not something the ordinary user of
ltsreg.formula needs to worry about.
It is relevant for those writing functions that call
ltsreg.formula or other
such model-fitting functions.
Burns, P. J. (1992).
A Genetic Algorithm for Robust Regression Estimation.
(StatSci Technical Note).
Rousseeuw, P. J. (1984).
Least median of squares regression.
Journal of the American Statistical Association
,
79, 871-881.
Rousseeuw, P. J. and Leroy, A. M. (1987).
Robust Regression and Outlier Detection.
New York: Wiley.
ltsreg(ozone~wind+radiation+temperature, data=air)
stacklts <- ltsreg(stack.loss~stack.x)
# reweighted least squares
stackrls <- lm(stack.loss~stack.x, weights=stacklts$lts.wt)
# myfit calls ltsreg, using the caller to myfit
# as the local context for variables in the formula
# (see aov for an actual example)
myfit <- function(formula, data = sys.parent(), ...) {
.. ..
fit <- ltsreg(formula, data, ...)
.. ..
}