Robust MM Linear Regression

Robust regression models are useful for fitting linear relationships when the random variation in the data is not Gaussian (normal) or when the data contain significant outliers. In such situations, standard linear regression may return inaccurate estimates.

The robust MM regression method returns a model that is almost identical in structure to a standard linear regression model. This allows the production of familiar plots and summaries with a robust model.

To perform linear regression

Choose Statistics __image\arrow5.gif Regression __image\arrow5.gif Robust MM. The dialog shown below appears.

Model Page

__image\robmm1.gif

In the Robust MM Linear Regression dialog, the Model page has the following options:

Data

Data Set

Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.

Weights

Enter the column that specifies weights to be applied to all observations used in the analysis. To weight all rows equally, leave this blank.

Subset Rows

Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.

Omit Rows with Missing Values

Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.

Variables

Dependent Variables

Select a variable as the dependent variable in the formula. The variable name will appear in the formula field below, followed by a '~'.

Independent Variables

Select one or more variables as the independent variables, or predictor, in the formula. To select more than one variable, Ctrl-click the variables.

Formula

In the Formula field, enter a formula specifying the desired model. In its simplest form a formula consists of the response variable, a tilde (~), and a list of predictor variables separated by "+"s. An intercept is automatically included by default.

Save Model Object

In the Save As field, enter the name for the object in which to save the results of the analysis. If an object with this name already exists, its contents are overwritten. The model object can be used in later functions such as plotting.

Options Page

The Options page of the Robust Linear Regression dialog contains the controls specific to the robust MM estimate regression method.

__image\robmm2.gif

In the Robust MM Linear Regression dialog, the Options page has the following options:

Estimation Method

Test Based Select this option to have S-PLUS calculate both initial S-estimates and final M-estimates, and report results for one of these estimates, based on a test for bias.

Loss Functions

Initial Robust Select this option to have S-PLUS calculate only initial S-estimates.

Final Robust Select this option to have S-PLUS calculate only final M-estimates.

Inference

Efficiency Enter the asymptotic efficiency of the final M-estimates.

Confidence Level Enter the desired level of significance of the test for bias of the final M estimates.

Optimization

Max. Iterations Enter the name of a list with components "mxf", "mxr", and "mxs", representing the maximum number of iterations, respectively, for final coefficient estimates, the refinement step, and the final scale estimate. The default value, Auto, sets all three to 50.

Tolerance Enter the name of a list with components "tlo", "tua", and "tl", representing, respectively, the relative tolerance in the iterative algorithms, the tolerance used for the determination of pseudo-rank, and the tolerance for scale denominators. The default value, Auto, sets the tolerances as follows:

tlo = 0.0001, tua = 1.5e-006, tl = 1e-006

Resampling Algorithm

Auto Select this to have S-PLUS determine the algorithm to use based upon the size of the data. If choose(nrow(data), ncol(data)) < 20000 then the Exhaustive algorithm is used. Otherwise the Random algorithm is used.

Random Select this option to use the random resampling algorithm.

Exhaustive Select this option to use the exhaustive resampling algorithm only if the sample size is less than 300 and the number of predictor variables is less than 10.

Genetic Select this option to use the genetic resampling algorithm.

Genetic Algorithm

Population Size Enter the population size of the genetic stock. The default is 10 times the number of parameters being fit.

Random Samples Enter the number of random samples taken after the stock is filled. The default is 50 times the number of parameters being fit.

Max Observations Enter the maximum number of observations (including duplicates) in a member of the stock. The default is p if (n-p)/2 is less than p, where n is the number of observations; otherwise it is the minimum of trunc((n-p)/2) and 5*p.

Genetic Births Enter the number of genetic births. The default is (50*p)+(15*p^2).

Mutation Prob. Enter a length 4 vector of mutation probabilities for offspring.

Stocks Enter a list of vector of observation numbers to be included in the stock. This is typically the stock component of the output of a previous run.

Stock Prob. Enter a vector of cumulative probabilities that a member of the stock will be chosen as a parent. The ith element corresponds to the individual with the ith lowest objective. The default is

cumsum((2*(popsize:1))/popsize/(popsize+1)

Random Resampling

Subsamples Enter the number of random subsamples to be drawn. The default value, Auto, draws 4.6*2^ncol(x) samples.

Random Seed Enter the seed parameter used in the random sampling algorithm.

Results Page

On the Results page, choose the type of printed results and how you would like the results of the analysis saved. To select or clear an option, click the check box.

__image\robmm3.gif

In the Robust MM Linear Regression dialog, the Results page has the following options:

Printed Results

Short Output for Robust MM Linear Regression

Select this to display a short summary of the model fit. This includes the model formula, the robust estimates of regression coefficients and residual scale, and the degrees of freedom.

Long Output for Robust MM Linear Regression

Select this to display a detailed summary of the model fit.ANOVA Table

Correlation Matrix of Estimates

Display the correlation matrix of the regression coefficients. This option is available only if Long Output is selected.

ANOVA Table

Display an analysis of variance table. The sums-of-squares in the table are for the terms added sequentially (Type I sums-of-squares).

Comparison with LS Fit

Select this to display the output of the robust MM-estimate fit together with the results for a standard least squares linear model fit of the same formula.

Saved Results

Save In

Enter the name of a data set in which a part of the analysis, such as fitted values and residuals, predictions, confidence intervals, or standard errors, is saved. If an object with the name you enter does not already exist (in database 1), then it is created

Fitted Values

Save the fitted values from the model in the object specified in Save In.

Residuals

Save the residuals from the model in the object specified in Save In. These are the ordinary residuals (the response minus the fitted value).

Plot Page

__image\robmm4.gif

In the Robust MM Linear Regression dialog, the Results page has the following options:

Plots

Residuals vs Fit

Select this to display a plot of the residuals versus the fitted values.

Sqrt Abs Residuals vs Fit

Display a plot of the square root of the absolute values of the residuals versus the fitted values. This plot is useful for checking for the constant variance assumption of the model.

Response vs Fit

Display a plot of the response variable versus the fitted values. The line y = x is also drawn on the graph.

Residuals Normal QQ

Display a normal quantile-quantile plot of the residuals.

Residual-Fit Spread

Display a residual-fit spread plot. This is a visual analog of the multiple R-squared statistic. It compares the spread of the fitted values to the spread of the residuals.

Partial Residuals

Display partial residual plots for all the terms in the model.

Options

Include Smooth

Display a smooth curve, computed with loess.smooth, on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, and Response vs Fit plots. See the online Help for loess.smooth for details.

Include Rugplot

Display a rugplot on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, and Response vs Fit plots. A rugplot is a sequence of vertical bars along the x-axis that mark the "observed" x values.

Number of Extreme Points to Identify

Enter the number of extreme points that are identified on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, Residuals Normal QQ, and Cook's Distance plots. The row names from the data set specified on the model page are used to identify the points.

Comparison Plots with LS Fit

Comparison Plot of Residuals Normal QQ

Select this to include a graph showing the qqnorm plot of the residuals of the robust fit together with the qqnorm plot of the residuals of the standard least squares fit.

Comparison Plot of Estimated Residual Densities

Select this to include a graph showing the density estimate for the residuals of the robust fit together with the density estimate for the residuals of the standard least squares fit.

Comparison Plot of Residuals vs Fit

Select this to include a graph showing the residuals vs fit plots for both the robust model and the standard least squares fit.

Comparison Plot of Response vs Fit

Select this to include a graph showing the response vs fit plots for both the robust model and the standard least squares fit.

Predict Page

__image\robmm5.gif

In the Robust MM Regression dialog, the Predict page has the following options:

New Data

Enter the name of a matrix or data set to use for computing predictions. It must contain the same names as the terms in the right side of the formula for the model. If omitted, the original data are used for computing predictions.

Save

Save In

Enter the name of a data set in which a part of the analysis, such as fitted values and residuals, predictions, confidence intervals, or standard errors, is saved.

Predictions

Select this to save predictions to the data set specified in Save In.

Confidence Intervals

Store lower and upper confidence limits in the object specified in Save In.

Standard Errors

Store the pointwise standard errors for the predictions in the object specified in Save In.

Options

Confidence Level

Enter the confidence level to use when computing confidence intervals. This value should be less than 1 and greater than 0.

S-Plus language functions related to Linear Models

lmRobMM, plot.lmRobMM, predict.lm, print.lmRobMM, summary.lmRobMM, lmRobMM.robust.control, lmRobMM.genetic.control

Other related S-Plus language functions

aov, gam, glm, lm, loess, nls