Use ltsreg on a Vector, Matrix, or Data Frame

DESCRIPTION:

Performs least trimmed squares (LTS) regression (Rousseeuw, 1984) of y on x. x is allowed to be either a vector, a matrix, or a data frame, and y must be a vector or a one-dimensional data frame or matrix. This is the default method for the function ltsreg.

USAGE:

ltsreg.default(x, y, intercept=T, quan=<<see below>>, singular.ok=F, 
               qr.out=F, wt=T, yname=NULL, mcd=T, popsize=<<see below>>, 
               mutate.prob= c(0.15,0.2,0.2,0.2), random.n=<<see below>>, 
               births.n=<<see below>>, stock=list(), maxslen=<<see below>>, 
               stockprob=<<see below>>, nkeep=1) 

REQUIRED ARGUMENTS:

x
a vector, matrix, or data frame of explanatory variables. Rows of the matrix represent observations, columns represent variables. A constant term should not be included; a better result is usually achieved by removing such a column and setting intercept=TRUE. Missing values ( NAs) and Infinite values ( Infs) are allowed.
y
a vector or one-dimensional data frame that represents the response variable. Missing values ( NAs) and Infinite values ( Infs) are allowed. Observations with missing or infinite values in either x or y are excluded from the computations.

OPTIONAL ARGUMENTS:

intercept
logical flag: should a constant term (intercept) be included in the model?
quan
the number of squared residuals whose sum will be minimized. The default value is max(floor((n+p+1)/2), floor(0.9*n)), where n is the number of observations and p is the number of regression coefficients. In general, quan must be an integer between floor((n+p+1)/2) and n.
singular.ok
logical flag: if FALSE, then an error is created if x (plus the intercept) is found to be singular.
qr.out
logical flag: if TRUE, then a list representing the QR decomposition of x is returned.
wt
logical flag: should weights computed by ltsreg be returned? These weights can be used in lsfit or lm to obtain a weighted least squares solution.
yname
character string of the name of the variable y.
mcd
logical flag: if TRUE, cov.mcd will be called on x. The results are needed in plot.lts for the diagnostic plot of robust residuals versus robust x-distances.
popsize
the population size of the genetic stock. The default is 10 times the number of parameters being fit.
mutate.prob
length 4 vector of mutation probabilities for offspring. The first element is the probability of a mutation to one observation in the offspring. The second through fourth elements give the probability that the length of the offspring will be one shorter than the mother, one longer than the mother, or a random length, respectively.
random.n
the number of random samples taken after the stock is filled. The default is 50 times the number of parameters being fit.
births.n
the number of genetic births. The default is (50*p)+(15*p^2), where p is the rank of x (including the intercept, if any). This default allows reasonably accurate estimation for p at least up to twenty. You may consider doubling this if you want to ensure very accurate minimization of the objective.
stock
a list of vectors of observation numbers to be included in the stock. This is typically the stock component of the output of a previous call to the function.
maxslen
the maximum number of observations (including duplicates) in a member of the stock. The default is p if (n-p)/2 is less than p, where n is the number of observations, and it is the minimum of trunc((n-p)/2) and 5*p otherwise.
stockprob
vector of cumulative probabilities that a member of the stock will be chosen as a parent. The ith element corresponds to the individual with the ith lowest objective. The default is cumsum((2*(popsize:1))/popsize/(popsize+1)).
nkeep
the number of individuals in the stock to keep in the output.

VALUE:

a list of class "lts" giving the solution. See the lts.object help file for details.

SIDE EFFECTS:

creates the dataset .Random.seed if it does not already exist, otherwise its value is updated.

DETAILS:

Least trimmed squares (LTS) regression was proposed by Rousseeuw (1984), and more information can be found in Rousseeuw and Leroy (1987). This regression method minimizes the sum of the quan smallest squared residuals. The default value of quan is the maximum of floor((n+p+1)/2) and 0.9*n , where n is the number of observations and p is the number of coefficients (including the intercept term, if present). The user can also set a larger value of quan, up to quan=n, in which case the LTS fit equals the least squares fit (see the help file of lsfit).

The breakdown value of LTS (that is, the amount of bad data it can withstand) is roughly (n-quan)/n. For the default quan, this is 50%, the highest possible breakdown value; whereas for quan=n, it becomes as low as 0%.

If the data are univariate (hence p=1), the exact algorithm location.lts is used (see the corresponding help file).

When p>1, the LTS regression can be computed by the basic subsampling algorithm in Rousseeuw and Leroy (1987). Here, a genetic algorithm (Burns, 1992) is used. Individual solutions are defined by a set of observation numbers, which corresponds to a least squares fit with the specified observations. A stock of popsize individuals is produced by random sampling, then a number of random samples are taken and the best solution is saved in the stock. During the genetic phase, two parents are picked which produce an offspring that contains a sample of the observations from the parents. The best two out of the three are retained in the stock. The best of all of the solutions found is used to compute the coefficients and the residuals. The standard random sampling algorithm can be used by setting popsize to one, maxslen to p, and births.n to zero.

The mutate.prob argument controls the mutation of the offspring. The length of the offspring is initially set to be the length of the first parent. This length is reduced by one, increased by one, or given a length uniformly distributed between p and maxslen, according to the last three probabilities in mutate.prob. The other type of mutation that can occur is for one of the observations of the offspring to be changed to an observation picked at random from among all of the observations; the probability of this mutation is specified by the first element of mutate.prob .

The ltsreg.default function is based on a random algorithm that starts with the same seed on each call to ltsreg. Therefore, the result is reproducible.

REFERENCES:

Burns, P. J. (1992). A Genetic Algorithm for Robust Regression Estimation. (StatSci Technical Note).

Rousseeuw, P. J. (1984) Least median of squares regression. Journal of the American Statistical Association , 79, 871-881.

Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. New York: Wiley.

SEE ALSO:

, , , , , , , .

EXAMPLES:

stacklts <- ltsreg(stack.x, stack.loss) 
# reweighted least squares 
stackrls <- lm(stack.loss~stack.x,weights=stacklts$lts.wt)