lqs(x, ...) lqs.formula(formula, data = NULL, ..., method = c("lts", "lqs", "lms", "S", "model.frame"), subset, na.action = na.fail, model = T, x = F, y = F, contrasts = NULL) lqs.default(x, y, intercept = T, method = c("lts", "lqs", "lms", "S"), quantile, control = lqs.control(...), k0 = 1.548, seed, ...)
y ~ x1 + x2 + ...
.
formula
are preferentially to be taken.
NA
s are found. The default action is for
the procedure to fail. Alternatives include
and
, which lead to
omission of cases with missing values on any required variable.
(NOTE: If given, this argument must be named exactly.)
x
.
model.frame
returns the model frame: for the others see the
Details
section.
Details
. This is over-ridden if
method = "lms"
.
Details
.
method ="S"
,
currently corresponding to Tukey's ``biweight''.
.Random.seed
. The current value of
.Random.seed
will be preserved if it is
set.
TRUE
the model frame is returned.
contrasts.arg
of
.
lqs.default
or
lqs.control
.
"lqs"
.
This is a list with components
method == "S"
before IWLS refinement.
method ==
"S"
) is based on the variance of those residuals whose
absolute value is less than 2.5 times the initial estimate.
Suppose there are
n
data points and
p
regressors, including any intercept.
The first three methods minimize some function of the sorted squared
residuals. For methods
"lqs"
and
"lms"
is the
quantile
squared residual, and for
"lts"
it is the sum of the
quantile
smallest squared
residuals.
"lqs"
and
"lms"
differ in the defaults for
quantile
, which are
floor((n+p+1)/2)
and
floor((n+1)/2)
respectively. For
"lts"
the default is
floor(n/2) + floor((p+1)/2)
.
The
"S"
estimation method solves for the
scale
s
such that the average of a
function chi of the residuals divided by
s
is equal to a given constant.
The
control
argument is a list with components
the size of each sample. Defaults to
p
.
"best"
or
"exact"
or
"sample"
.
If
"sample"
the number
chosen is
min(5*p, 3000)
,
taken from Rousseeuw and Hubert (1997).
If
"best"
exhaustive enumeration
is done up to 5000 samples:
if
"exact"
exhaustive enumeration
will be attempted however many samples are needed.T
.
There seems no reason other than historical to use the
lms
and
lqs
options. LMS estimation is of low efficiency (converging at rate
n^{-1/3}) whereas LTS has the same asymptotic efficiency as an M
estimator with trimming at the quartiles (Marazzi, 1993, p.201). LQS
and LTS have the same maximal breakdown value of
(floor((n-p)/2) + 1)/n
attained if
floor((n+p)/2) <= quantile <= floor((n+p+1)/2)
.
The only drawback mentioned of LTS is greater
computation, as a sort was thought to be required (Marazzi, 1993,
p.201) but this is not true as a partial sort can be used (and is used
in this implementation).
Adjusting the intercept for each trial fit does need the residuals to
be sorted, and may be significant extra computation if
n
is large and
p
small.
Opinions differ over the choice of
psamp
. Rousseeuw and Hubert (1997) only
consider p; Marazzi (1993) recommends p+1 and suggests that more
samples are better than adjustment for a given computational limit.
The computations are exact for a model with just an intercept and adjustment, and for LQS for a model with an intercept plus one regressor and exhaustive search with adjustment. For all other cases the minimization is only known to be approximate.
P. J. Rousseeuw and A. M. Leroy (1987) Robust Regression and Outlier Detection. Wiley.
A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth & Brooks/Cole.
P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. In L1-Statistical Procedures and Related Topics, ed Y. Dodge, IMS Lecture Notes volume 31, pp. 201–214.
stackloss <- data.frame(stack.x, stack.loss) lqs(stack.loss ~ ., data=stackloss) lqs(stack.loss ~ ., data=stackloss, method = "S", nsamp = "exact")