Linear Regression

Linear regression is used to describe the effect of continuous or categorical variables upon a continuous response. It is by far the most common regression procedure.

The linear regression model assumes that the response is obtained by taking a specific linear combination of the predictors and adding random variation (error). The error is assumed to have a Gaussian (normal) distribution with constant variance and to be independent of the predictor values.

If the response of interest is not continuous, the logistic regression, log-linear regression, or generalized linear models may be appropriate. If the predictors affect the response in a nonlinear way, the nonlinear regression, local regression, or generalized additive models may be appropriate.

If the data contain outliers or the errors are not Gaussian, then robust regression may be appropriate. If the focus is on the effect of categorical variables, then ANOVA may be appropriate.

Other dialogs related to linear regression are: Stepwise Linear Regression, Compare Models, and Multiple Comparisons. The Stepwise Linear Regression dialog uses a stepwise procedure to suggest which variables to include in a model. Compare Models provides tests for determining which of several models is most appropriate. Multiple Comparisons calculates effects for categorical predictors in linear regression or ANOVA.

To perform linear regression

Choose Statistics __image\arrow5.gif Regression __image\arrow5.gif Linear. The dialog shown below appears.

Model Page

On the Model page, specify the data set and any weights to be used, the linear regression formula, and how to save the model object.

__image\lnreg1.gif

In the Linear Regression dialog, the Model page has the following options:

Data

Data Set

Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.

Weights

Enter the column that specifies weights to be applied to all observations used in the analysis. To weight all rows equally, leave this blank.

Subset Rows

Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.

Omit Rows with Missing Values

Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.

Variables

Dependent Variables

Select a variable as the dependent variable in the formula. The variable name will appear in the formula field below, followed by a '~'.

Independent Variables

Select one or more variables as the independent variables, or predictor, in the formula. To select more than one variable, Ctrl-click the variables.

Formula

In the Formula field, enter a formula specifying the desired model. In its simplest form a formula consists of the response variable, a tilde (~), and a list of predictor variables separated by "+"s. An intercept is automatically included by default.

Save Model Object

In the Save As field, enter the name for the object in which to save the results of the analysis. If an object with this name already exists, its contents are overwritten. The model object can be used in later functions such as plotting.

Results Page

On the Results page, choose the type of printed results and how you would like the results of the analysis saved. To select or clear an option, click the check box.

__image\lnreg2.gif

In the Linear Regression dialog, the Results page has the following options:

Printed Results

Short Output for Linear Regression

Display a short summary of the model fit. This includes the model formula, the regression coefficients, the residual standard error and the degrees of freedom.

Long Output for Linear Regression

Display a detailed summary of the model fit.

ANOVA Table

Display an analysis of variance table. The sums-of-squares in the table are for the terms added sequentially (Type I sums-of-squares).

Correlation Matrix of Estimates

Display the correlation matrix of the regression coefficients. This option is available only if Long Output is selected.

Saved Results

Save In

Enter the name of a data set in which a part of the analysis, such as fitted values and residuals, predictions, confidence intervals, or standard errors, is saved. If an object with the name you enter does not already exist (in database 1), then it is created

Fitted Values

Save the fitted values from the model in the object specified in Save In.

Residuals

Save the residuals from the model in the object specified in Save In. These are the ordinary residuals (the response minus the fitted value).

Plot Page

On the Plot page, choose the type of plots, smoothing and rugplot options, and partial residual plot options. To select or clear an option, click the check box.

__image\lnreg3.gif

In the Linear Regression dialog, the Plot page has the following options:

Plots

Residuals vs Fit

Select this to display a plot of the residuals versus the fitted values.

Sqrt Abs Residuals vs Fit

Display a plot of the square root of the absolute values of the residuals versus the fitted values. This plot is useful for checking for the constant variance assumption of the model.

Response vs Fit

Display a plot of the response variable versus the fitted values. The line y = x is also drawn on the graph.

Residuals Normal QQ

Display a normal quantile-quantile plot of the residuals.

Residual-Fit Spread

Display a residual-fit spread plot. This is a visual analog of the multiple R-squared statistic. It compares the spread of the fitted values to the spread of the residuals.

Cook's Distance

Display a plot of Cook's distance values versus the observation number.

Partial Residuals

Display partial residual plots for all the terms in the model.

Options

Include Smooth

Display a smooth curve, computed with loess.smooth, on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, and Response vs Fit plots. See the online Help for loess.smooth for details.

Include Rugplot

Display a rugplot on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, and Response vs Fit plots. A rugplot is a sequence of vertical bars along the x-axis that mark the "observed" x values.

Number of Extreme Points to Identify

Enter the number of extreme points that are identified on the Residuals vs Fit, Sqrt Abs Residuals vs Fit, Residuals Normal QQ, and Cook's Distance plots. The row names from the data set specified on the model page are used to identify the points.

Partial Residual Plot Options

Include Partial Fit

Include the partial fit for the term on the plot.

Include Rugplot

Display rugplots on the partial residual plots. A rugplot is a sequence of vertical bars along the x-axis that mark the "observed" x values.

Common Y-Axis Scale

Give all the partial residual plots the same vertical units. This is essential for comparing the importance of fitted terms in additive models.

Predict Page

__image\lnreg4.gif

In the Linear Regression dialog, the Predict page has the following options:

New Data

Enter the name of a matrix or data set to use for computing predictions. It must contain the same names as the terms in the right side of the formula for the model. If omitted, the original data are used for computing predictions.

Save

Save In

Enter the name of a data set in which a part of the analysis, such as fitted values and residuals, predictions, confidence intervals, or standard errors, is saved.

Predictions

Select this to save predictions to the data set specified in Save In.

Confidence Intervals

Store lower and upper confidence limits in the object specified in Save In.

Standard Errors

Store the pointwise standard errors for the predictions in the object specified in Save In.

Options

Confidence Level

Enter the confidence level to use when computing confidence intervals. This value should be less than 1 and greater than 0.

S-Plus language functions related to Linear Models

lm, plot.lm, predict.lm, print.lm, summary.lm

Other related S-Plus language functions

aov, gam, glm, loess, nls