leaps(x, y, wt=<<see below>>, int=T, method="Cp", keep.int=T, keep=<<see below>>, nbest=10, names=<<see below>>, df=nrow(x), dropint=T)
x
is a
variable, each row an observation. There must be at least 3 and no more
than 31 columns. The matrix must be of full rank and there needs to be
fewer columns than rows.
Missing values are not accepted.
x
.
Missing values are not accepted.
lsfit
, that is, they should be inversely
proportional to the variance.
"Cp"
,
"r2"
, and
"adjr2"
corresponding to Mallows Cp statistic, R-squared, and
adjusted R-squared. Only the first character need be
supplied.
keep.int
.
"r2"
or
"Cp"
methods, the
nbest
subsets (of any size) are guaranteed to be included
in the output (but note that more subsets will also be
included).
ncol(x)
.
By default, the names are 1, 2, ... 9, A, B, ...
y
.
Useful if, for example,
x
and
y
have already been adjusted
for previous independent variables.
The degrees of freedom used are decreased by 1 if
int
is
TRUE
.
which
?
Cp
(or
adjr2
or
r2
),
size
and
label
will all have
the same length -- one element per subset; this will be the number of
rows in
which
.
Cp
,
adjr2
, or
r2
depending on the method used for evaluating the
subsets. This component gives the values of the desired
statistic.
If
r2
or
adjr2
are used, the result is in percent.
int
is
TRUE
) in each subset.
x
in the subset.
which
matrix contains, as its
first column, the status of the intercept variable.
If you desire to test the intercept term also, then include a column of
1
s
in
x
and set
int
to
FALSE
.
The
leaps
function provides a way to select a few promising regressions
(sets of explanatory variables) for further study. By using robustness weights
found by a robust regression on the full set of variables, this search can
find suitable regressions even when outliers are present.
The best known criterion for regression is the coefficient of determination
(R-squared). This has definite limitations in the context of the
leaps
function since the largest R-squared is the full set of explanatory variables.
To take account of the number of parameters being fit an adjusted R-squared
can be used.
The higher the adjusted R-squared (which, by the way, can be negative),
the better. It has been noted that the adjusted R-squared tends to favor
large regressions over smaller ones.
Another method of selecting regressions is with Mallow's Cp.
Small values of Cp close to or less than
$p$ are good.
Furnival, G. M. and Wilson, R. W. Jr. (1974). Regressions
by Leaps and Bounds.
Technometrics
16, 499-511.
Seber, G. A. F. (1977).
Linear Regression Analysis.
Wiley, New York.
Weisberg, S. (1985).
Applied Linear Regression
(second edition).
Wiley, New York.
r <- leaps(x, y) lsfit( x[,r$which[3,]], y ) #regression corresponding # to third subset longley.wt <- lmsreg(longley.x, longley.y)$lms.wt longley.leap <- leaps(longley.x, longley.y, longley.wt, names=c("D", "G", "U", "A", "P", "Y")) plot(longley.leap$size, longley.leap$Cp, type="n", ylim=c(0,15)) text(longley.leap$size, longley.leap$Cp, longley.leap$label) abline(0, 1) legend(2,15,pch="DGUAPY",legend=dimnames(longley.x)[[2]]) title(main="Cp Plot for Longley Data")