Scatterplot Matrix

A scatterplot matrix is an array of pairwise scatter plots showing the relationship between any pair of variables in a multivariate data set.

To generate a scatter plot matrix

Choose Graph __image\arrow5.gif Multiple Variables__image\arrow5.gif Scatter Plot Matrix. The dialog shown below appears.

Data page

__image\scatplotmat1.gif

In the Scatter Plot Matrix dialog, the Data page has the following options:

Data

Data Set

Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.

Subset Rows

Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.

Variables

Value

Specify the column of values to use in the density.

Conditioning

Select the columns specifying conditioning values.

Save Graph Object

Save As

Enter the name for the object in which to save the results of the analysis.

Plot page

__image\scatplotmat2.gif

In the Scatter Plot Matrix dialog, the Plot page has the following options:

Plot Type

Type

Specify the type of plot desired. This may be "Points", "Lines", "Both Points and Lines", "Overlaid Points and Lines", "Stairstep Lines", or "Vertical Lines".

Pre-Sort Data

Specify whether to sort the values before connecting them with lines. This may be "None" for no sorting, "Sort on X", or "Sort on Y".

Vary Style by Group

Group Variables

The variable names for the selected data set appear in the scrolling list.

Bar Color

Specify the color to use for histogram bars.

Include Border

Specify whether to include a border around each bar.

Include Legend

Symbol/Line Color

Color

Select the color to use for symbols and lines. This is ignored if Vary Color by Group is specified.

Symbol

Symbol Style

Select the symbol style to use. This is ignored if Vary Symbol Style by Group is specified.

Symbol Size

Specify the size of symbol to use. This is the amount of character expansion to use relative to the device's standard size.

Line

Line Style

Specify the line style to use. This is ignored if Vary Line Style by Group is specified.

Line Width

Specify the line width to use. This is the width relative to the device's standard line width.

Fit page

__image\scatplotmat3.gif

In the Scatter Plot Matrix dialog, the Fit page has the following options:

Regression

Regression Type

Specify the type of regression line to add. Select "None" for no line, "Least Squares" for the standard least-squares regression, or "Robust" for a robust-MM regression.

Smooth

Smoothing Type

Choose from the following options:

None uses no smoothing.

Kernel uses the ksmooth function to perform a kernel smooth, which is a generalization of local average smoothing.

Loess uses the loess function to fit a local regression.

Smoothing Spline uses the smooth.spline function and the predict.smooth.spline function to calculate predictions from a cubic B-spline. The regression is fit by penalized least squares between knots. For small data vectors (n < 50), a knot is placed at every distinct point. For larger data sets the number of knots is chosen judiciously in order to keep the computation time manageable.

Supersmoother uses the supsmu function to compute Friedman's variable span smoother. It uses a symmetric k-nearest neighbor linear least squares fitting procedure. The algorithm is fast, and by default uses cross validation to pick the span. This allows the user to specify a smoothing function.

User You also have the option of defining your own smoothing procedure.

# Output Points

Specify the number of points to be produced by the smoothing. If not specified, the default number of points will vary based upon which smoothing algorithm is used.

Kernel Specs

Bandwidth Enter a numeric value for the kernel bandwidth smoothing parameter. All kernels are scaled so the upper and lower quartiles of the kernel are 0.25 and -0.25 when the bandwidth is 1. Larger values of bandwidth make smoother estimates, while smaller values make less smooth estimates. The default bandwidth is 0.5.

Kernel From the dropdown menu, choose Box (a rectangular box), Triangle (a box convolved with itself), Parzen (the parzen function - a box convolved with a triangle), or Normal (a gaussian density function). The default Kernel is Normal.

Loess Specs

Span Select a number between 0 and 1 that will be used to control the amount of smoothing. Smaller values result in less smoothing. Very small values close to 0 are not recommended. By default, automatic (variable) span selection is done by means of cross validation. Reasonable span values are from 0.3 to 0.5. For small samples (n < 50), or if there are substantial serial correlations between observations close in x-value, a prespecified fixed span smoother should be used.

Degree Select the overall degree of the locally-fitted polynomial. One is locally-linear fitting, and Two is locally-quadratic fitting.

Family Select either Symmetric or Gaussian. The Symmetric option combines local-fitting with a robustness feature that guards against distortion by outliers. The Gaussian option strictly employs local-fitting methods.

Smoothing Spline Specs

Deg. Of Freedom The degrees of freedom should be between 1 and the number of input data points minus 1. The lower the degrees of freedom, the smoother the line. If Auto is selected cross-validation is used.

Supersmoother Specs

Span Select a number between 0 and 1 that will be used to control the amount of smoothing. Smaller values result in less smoothing. Very small values close to 0 are not recommended. By default, automatic (variable) span selection is done by means of cross validation. Reasonable span values are from 0.3 to 0.5. For small samples (n < 50), or if there are substantial serial correlations between observations close in x-value, a prespecified fixed span smoother should be used.

User-Defined Smoothing

Function Name Specify the name of the function to use for smoothing. The first arguments must be:

x: vector of x data

y: vector of y data

z: vector of z data (can be NULL)

w: vector of w data (can be NULL)

subscripts: vector of row indices

panelnum: panel number if conditioned

It must return a list containing the following components:

x: a vector of x data for line drawing

y: a vector of y data for line drawing

Other

Other Arguments

For any of the smoothing types, any of the optional arguments can be specified here. For example, if Friedman's supersmoother is used, the underlying supsmu function is called. If bass=5 is put into the Other Arguments field, this is passed down to the supsmu function when calculated.

Titles page

__image\scatplotmat4.gif

In the Scatter Plot Matrix dialog, the Titles page has the following options:

Titles

Main Title

Specify a main title to add on the top of the page.

Subtitle

Specify a subtitle to add on the bottom of the page.

Labels

X Axis Label

Specify a label for the x-axis.

Y Axis Label

Specify a label for the y-axis.

Multipanel page

__image\scatplotmat5.gif

In the Scatter Plot Matrix dialog, the Multipanel page has the following options:

Layout

Number of Columns/Rows/Pages Control the layout of the panels by specifying the number of columns, rows and pages.

Panel Order Choose from Graph Order or Table Order. Graph Order begins drawing panels in the bottom left corner of the graph, to the right and up. Table Order begins drawing panels in the upper left corner and continues right.

Include Strip Labels Check this to include strip labels on panels.

Continuous Conditioning

Number of Panels If the data are continuous, the number of panels is determined by the number specified in this field.

Fraction of Shared Points Create overlapping intervals by specifying the fraction of data points that are shared across two panels.

Interval Type Choose from Equal Counts or Equal Ranges. Equal Counts places an equal number of data points in each plot. Equal Ranges makes the interval widths all equal.