ltsreg
on a Vector, Matrix, or Data Frame
y
on
x
.
x
is allowed to be either a vector, a matrix, or a data frame,
and
y
must be a vector or a one-dimensional data frame or matrix.
This is the default method for the function
ltsreg
.
ltsreg.default(x, y, intercept=T, quan=<<see below>>, singular.ok=F, qr.out=F, wt=T, yname=NULL, mcd=T, popsize=<<see below>>, mutate.prob= c(0.15,0.2,0.2,0.2), random.n=<<see below>>, births.n=<<see below>>, stock=list(), maxslen=<<see below>>, stockprob=<<see below>>, nkeep=1)
intercept=TRUE
.
Missing values (
NA
s) and Infinite values (
Inf
s) are allowed.
NA
s) and Infinite values (
Inf
s) are allowed.
Observations with missing or infinite values in either
x
or
y
are excluded from the computations.
max(floor((n+p+1)/2), floor(0.9*n))
,
where
n
is the
number of observations and
p
is the number of regression coefficients.
In general,
quan
must be an integer between
floor((n+p+1)/2)
and
n
.
FALSE
,
then an error is created if
x
(plus the intercept) is found to be singular.
TRUE
, then a list representing the QR decomposition
of
x
is returned.
ltsreg
be returned?
These weights can be used in
lsfit
or
lm
to obtain a weighted
least squares solution.
y
.
TRUE
,
cov.mcd
will be called on
x
.
The results are needed in
plot.lts
for the diagnostic plot of robust residuals versus
robust
x
-distances.
(50*p)+(15*p^2)
,
where
p
is the rank of
x
(including the intercept, if any).
This default allows reasonably accurate estimation for
p
at least up
to twenty. You may consider doubling this if you want to ensure very
accurate minimization of the objective.
p
if
(n-p)/2
is less than
p
, where
n
is
the number of observations, and it is the minimum of
trunc((n-p)/2)
and
5*p
otherwise.
cumsum((2*(popsize:1))/popsize/(popsize+1))
.
"lts"
giving the solution.
See the
lts.object
help file for details.
.Random.seed
if it does not already
exist, otherwise its value is updated.
Least trimmed squares (LTS) regression was proposed by Rousseeuw (1984),
and more information can be found in Rousseeuw and Leroy (1987).
This regression method minimizes the sum of the
quan
smallest squared residuals.
The default value of
quan
is the maximum of
floor((n+p+1)/2)
and
0.9*n
,
where
n
is the number of observations
and
p
is the number of coefficients
(including the intercept term, if present).
The user can also set a larger value of
quan
,
up to
quan=n
,
in which case the LTS fit equals the least squares fit
(see the help file of
lsfit
).
The breakdown value of LTS (that is, the amount of bad data it can
withstand) is roughly
(n-quan)/n
. For the default
quan
, this is 50%,
the highest possible breakdown value; whereas for
quan=n
, it becomes
as low as 0%.
If the data are univariate (hence
p=1
), the exact algorithm
location.lts
is used (see the corresponding help file).
When
p>1
, the LTS regression can be computed by the basic subsampling
algorithm in Rousseeuw and Leroy (1987).
Here, a genetic algorithm (Burns, 1992) is used.
Individual solutions are defined by a set of observation numbers,
which corresponds to a least squares fit with the specified observations.
A stock of
popsize
individuals is produced by random sampling, then a number of random samples
are taken and the best solution is saved in the stock. During the genetic
phase, two parents are picked which produce an offspring that contains
a sample of the observations from the parents. The best two out of the three
are retained in the stock. The best of all of the solutions found is used
to compute the coefficients and the residuals. The standard random
sampling algorithm can be used by setting
popsize
to one,
maxslen
to
p
, and
births.n
to zero.
The
mutate.prob
argument controls the mutation of the offspring. The length
of the offspring is initially set to be the length of the first parent.
This length is reduced by one, increased by one, or given a length uniformly
distributed between
p
and
maxslen
, according to the last three
probabilities in
mutate.prob
. The other type of mutation that can occur
is for one of the observations of the offspring to be changed to an
observation picked at random from among all of the observations; the
probability of this mutation is specified by the first element of
mutate.prob
.
The
ltsreg.default
function is based on a random algorithm that starts
with the same seed on each call to
ltsreg
. Therefore, the result is
reproducible.
Burns, P. J. (1992).
A Genetic Algorithm for Robust Regression Estimation.
(StatSci Technical Note).
Rousseeuw, P. J. (1984)
Least median of squares regression.
Journal of the American Statistical Association
,
79, 871-881.
Rousseeuw, P. J. and Leroy, A. M. (1987).
Robust Regression and Outlier Detection.
New York: Wiley.
stacklts <- ltsreg(stack.x, stack.loss) # reweighted least squares stackrls <- lm(stack.loss~stack.x,weights=stacklts$lts.wt)