Robust MM Linear Regression
Robust regression models are useful for fitting linear relationships when the random variation in the data is not Gaussian (normal) or when the data contain significant outliers. In such situations, standard linear regression may return inaccurate estimates.
The robust MM regression method returns a model that is almost identical in structure to a standard linear regression model. This allows the production of familiar plots and summaries with a robust model.
To perform linear regression
Choose Statistics Regression
Robust MM. The dialog shown below appears.
Model Page
In the Robust MM Linear Regression dialog, the Model page has the following options:
Data
Data Set
Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.
Weights
Enter the column that specifies weights to be applied to all observations used in the analysis. To weight all rows equally, leave this blank.
Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.
Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.
Variables
Dependent Variables
Select a variable as the dependent variable in the formula. The variable name will appear in the formula field below, followed by a '~'.
Select one or more variables as the independent variables, or predictor, in the formula. To select more than one variable, Ctrl-click the variables.
In the Formula field, enter a formula specifying the desired model. In its simplest form a formula consists of the response variable, a tilde (~), and a list of predictor variables separated by "+"s. An intercept is automatically included by default.
In the Save As field, enter the name for the object in which to save the results of the analysis. If an object with this name already exists, its contents are overwritten. The model object can be used in later functions such as plotting.
Options Page
The Options page of the Robust Linear Regression dialog contains the controls specific to the robust MM estimate regression method.
In the Robust MM Linear Regression dialog, the Options page has the following options:
Estimation Method
Test Based Select this option to have S-PLUS calculate both initial S-estimates and final M-estimates, and report results for one of these estimates, based on a test for bias.
Loss Functions
Initial Robust Select this option to have S-PLUS calculate only initial S-estimates.
Final Robust Select this option to have S-PLUS calculate only final M-estimates.
Inference
Efficiency Enter the asymptotic efficiency of the final M-estimates.
Confidence Level Enter the desired level of significance of the test for bias of the final M estimates.
Optimization
Max. Iterations Enter the name of a list with components "mxf", "mxr", and "mxs", representing the maximum number of iterations, respectively, for final coefficient estimates, the refinement step, and the final scale estimate. The default value, Auto, sets all three to 50.
Tolerance Enter the name of a list with components "tlo", "tua", and "tl", representing, respectively, the relative tolerance in the iterative algorithms, the tolerance used for the determination of pseudo-rank, and the tolerance for scale denominators. The default value, Auto, sets the tolerances as follows:
tlo = 0.0001, tua = 1.5e-006, tl = 1e-006
Resampling Algorithm
Auto Select this to have S-PLUS determine the algorithm to use based upon the size of the data. If choose(nrow(data), ncol(data)) < 20000 then the Exhaustive algorithm is used. Otherwise the Random algorithm is used.
Random Select this option to use the random resampling algorithm.
Exhaustive Select this option to use the exhaustive resampling algorithm only if the sample size is less than 300 and the number of predictor variables is less than 10.
Genetic Select this option to use the genetic resampling algorithm.
Genetic Algorithm
Population Size Enter the population size of the genetic stock. The default is 10 times the number of parameters being fit.
Random Samples Enter the number of random samples taken after the stock is filled. The default is 50 times the number of parameters being fit.
Max Observations Enter the maximum number of observations (including duplicates) in a member of the stock. The default is p if (n-p)/2 is less than p, where n is the number of observations; otherwise it is the minimum of trunc((n-p)/2) and 5*p.
Genetic Births Enter the number of genetic births. The default is (50*p)+(15*p^2).
Mutation Prob. Enter a length 4 vector of mutation probabilities for offspring.
Stocks Enter a list of vector of observation numbers to be included in the stock. This is typically the stock component of the output of a previous run.
Stock Prob. Enter a vector of cumulative probabilities that a member of the stock will be chosen as a parent. The ith element corresponds to the individual with the ith lowest objective. The default is
cumsum((2*(popsize:1))/popsize/(popsize+1)
Random Resampling
Subsamples Enter the number of random subsamples to be drawn. The default value, Auto, draws 4.6*2^ncol(x) samples.
Random Seed Enter the seed parameter used in the random sampling algorithm.
Results Page
On the Results page, choose the type of printed results and how you would like the results of the analysis saved. To select or clear an option, click the check box.
In the Robust MM Linear Regression dialog, the Results page has the following options:
Printed Results
Short Output for Robust MM Linear Regression
Select this to display a short summary of the model fit. This includes the model formula, the robust estimates of regression coefficients and residual scale, and the degrees of freedom.
Long Output for Robust MM Linear Regression
Select this to display a detailed summary of the model fit.ANOVA Table
Correlation Matrix of Estimates
Display the correlation matrix of the regression coefficients. This option is available only if Long Output is selected.
ANOVA Table
Display an analysis of variance table. The sums-of-squares in the table are for the terms added sequentially (Type I sums-of-squares).
Comparison with LS Fit
Select this to display the output of the robust MM-estimate fit together with the results for a standard least squares linear model fit of the same formula.
Saved Results
Enter the name of a data set in which a part of the analysis, such as fitted values and residuals, predictions, confidence intervals, or standard errors, is saved. If an object with the name you enter does not already exist (in database 1), then it is created
Fitted Values
Save the fitted values from the model in the object specified in Save In.
Residuals
Save the residuals from the model in the object specified in Save In. These are the ordinary residuals (the response minus the fitted value).
Plot Page
In the Robust MM Linear Regression dialog, the Results page has the following options:
Plots
Residuals vs Fit
Select this to display a plot of the residuals versus the fitted values.
Sqrt Abs Residuals vs Fit
Display a plot of the square root of the absolute values of the residuals versus the fitted values. This plot is useful for checking for the constant variance assumption of the model.
Response vs Fit
Display a plot of the response variable versus the fitted values. The line y = x is also drawn on the graph.
Residuals Normal QQ
Display a normal quantile-quantile plot of the residuals.
Residual-Fit Spread
Display a residual-fit spread plot. This is a visual analog of the multiple R-squared statistic. It compares the spread of the fitted values to the spread of the residuals.
Display partial residual plots for all the terms in the model.
Options
Include Smooth
Display a smooth curve, computed with loess.smooth, on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, and Response vs Fit plots. See the online Help for loess.smooth for details.
Include Rugplot
Display a rugplot on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, and Response vs Fit plots. A rugplot is a sequence of vertical bars along the x-axis that mark the "observed" x values.
Number of Extreme Points to Identify
Enter the number of extreme points that are identified on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, Residuals Normal QQ, and Cook's Distance plots. The row names from the data set specified on the model page are used to identify the points.
Comparison Plots with LS Fit
Comparison Plot of Residuals Normal QQ
Select this to include a graph showing the qqnorm plot of the residuals of the robust fit together with the qqnorm plot of the residuals of the standard least squares fit.
Comparison Plot of Estimated Residual Densities
Select this to include a graph showing the density estimate for the residuals of the robust fit together with the density estimate for the residuals of the standard least squares fit.
Comparison Plot of Residuals vs Fit
Select this to include a graph showing the residuals vs fit plots for both the robust model and the standard least squares fit.
Comparison Plot of Response vs Fit
Select this to include a graph showing the response vs fit plots for both the robust model and the standard least squares fit.
Predict Page
In the Robust MM Regression dialog, the Predict page has the following options:
New Data
Enter the name of a matrix or data set to use for computing predictions. It must contain the same names as the terms in the right side of the formula for the model. If omitted, the original data are used for computing predictions.
Save
Enter the name of a data set in which a part of the analysis, such as fitted values and residuals, predictions, confidence intervals, or standard errors, is saved.
Predictions
Select this to save predictions to the data set specified in Save In.
Store lower and upper confidence limits in the object specified in Save In.
Standard Errors
Store the pointwise standard errors for the predictions in the object specified in Save In.
Options
Confidence Level
Enter the confidence level to use when computing confidence intervals. This value should be less than 1 and greater than 0.
S-Plus language functions related to Linear Models
lmRobMM, plot.lmRobMM, predict.lm, print.lmRobMM, summary.lmRobMM, lmRobMM.robust.control, lmRobMM.genetic.control
Other related S-Plus language functions
aov, gam, glm, lm, loess, nls