Note: This function has been superseded by the function .
stepwise(x, y, wt=<<see below>>, intercept=T, tolerance=1.e-07, method="efroymson", size.max=ncol(x), nbest=3, f.crit=2, xinclude, plot=T, time=0.02)
x
should equal the length of
y
,
and there should be fewer columns than rows.
Missing values are allowed.
If a data frame,
x
is coerced into a numeric matrix,
hence factor data are transformed to numeric values using the function
codes
.
wt
should
be inversely proportional to the variance.
By default, an unweighted regression is carried out.
Missing values are allowed.
forward
,
backward
,
efroymson
, or
exhaustive
for forward selection, backward elimination, Efroymson's forward stepwise
and exhaustive search respectively.
Only enough of the string for a unique match needs to be given.
method="exhaustive"
, or
method="forward"
.
If
ncol(x)
is large (say > 35), then
size.max
should
be specified smaller to make exhaustive searches possible.
method="exhaustive"
.
ncol(x)
, with values set
TRUE
for each
column of
x
that is to be forced into the subsets.
This argument only works with
method="exhaustive"
.
TRUE
, and a graphics device is available,
the residual sum of squares for each model is plotted against subset size.
ncol(x)
> 12 then an estimate of
time required for the computations will be made. If the estimated time for the search in hours
is greater than this value, a message will be printed giving the estimated time.
The estimates are very approximate.
x
in the subset. For the
forward
method
there are
ncol(x)
rows with subsets of size 1, ...,
ncol(x)
.
For the
backward
method there are
ncol(x)
rows with subsets
of size
ncol(x)
, ...,
1
.
For Efroymson's method there is a row for each step of the stepwise
procedure.
For the exhaustive search, there are
nbest
subsets for each size (if available).
The row labels consist of the subset size with some additional
information in parentheses. For the stepwise methods the
extra information is
+n
or
-n
to indicate that the
n-th
variable has been added or dropped. For the exhaustive
method, the extra information is #i where
i
is the subset number.
exhaustive
method.
if
plot=TRUE
, a plot of residual sum of squares versus model size is
created on the current graphics device.
The forward selection procedure starts with an empty subset, and
at each step adds the independent variable that gives the largest
reduction of the residual sum of squares.
The backward elimination procedure starts with a complete set, and
at each step drops the independent variable that gives the smallest
increase in the residual sum of squares.
Efroymson's stepwise method is like forward selection, except that when
each new variable is added to the subset, partial correlations
are considered to see if any of the variables in the subset
should now be dropped.
The exhaustive search considers all possible subsets of a given
size, and chooses the one with the smallest residual sum of
squares.
An observation is considered missing if there is a nonfinite value in the
response variable, any explanatory variable or the weight (if present)
for that observation.
Such observations are dropped from the computations.
This function is based on Fortran code written by Alan Miller,
CSIRO Division of Mathematics and Statistics, and his monograph
provides details of the methods used, and advice on how to use
these procedures.
The
stepwise
function provides several methods for selecting
regressions (sets of explanatory variables) for further study.
As a first step one or more of the stepwise methods should be
used, as these quickly indicate how many explanatory variables
may be needed in the regression.
Next, the exhaustive search may be useful to select the "best"
set of explanatory variables. The
stepwise
function has an
advantage over the
leaps
function in that it can search all
subsets of size
size.max
, where
size.max
is less than
ncol(x)
.
Also it does not have the restriction that
ncol(x)
must be less
than 32.
Depending on the speed of the computer it is possible to search
for subsets with
size.max
up to 30 or 35.
Subsets larger than this may be handled if it is clear that
some explanatory variables must be included,
and the
xinclude
argument can specify this.
Draper, N. R. and Smith, H. (1981).
Applied Regression Analysis,
(second edition). New York: Wiley.
Gentleman, W. M. (1974).
Basic procedures for large sparse or weighted least-squares.
Applied Statistics
23, 448-454.
Miller, A. J. (1990).
Subset Selection in Regression.
Monographs on Statistics and Applied Probability 40,
London: Chapman and Hall.
Miller, A. J. (1984).
Selection of subsets of regression variables (with discussion).
Journal Royal Statistical Society, Series A
147, 389-425.
Osborne, M. R. (1976).
On the computation of stepwise regressions.
Australia Computer Journal
8, 61-68.
z1 <- stepwise(evap.x, evap.y) # use Efroymson's method z2 <- stepwise(evap.x, evap.y, method="ex") # exhaustive search