step.gam(object, scope, scale, direction, trace = T, keep, steps)
gam
or any of its inheritants.
1
in the formula allows the additional option of leaving the
term out of the model entirely.
"both"
,
"backward"
, or
"forward"
, with a default of
"both"
.
TRUE
, information is printed during the running of
step.gam
.
This is an encouraging choice in general,
since
step.gam
can take some time to compute either
for large models or when called with an extensive
scope
argument.
A simple one line model summary is printed for each model
visited in the search, and the selected model is noted at each step.
gam
object and the
associated "AIC" statistic, and whose output is arbitrary.
Typically
keep
will select a subset of the components of the
object and return them. The default is not to keep anything.
"anova"
component corresponding to the steps taken in the
search, as well as a
"keep"
component
if the
keep=
argument was supplied in the call.
Each of the formulas in
scope
specifies a "regimen" of candidate forms in
which the particular term may enter the model.
For example, a term formula might be
~ Income + log(Income) + s(Income)
This means that
Income
could either appear linearly,
linearly in its logarithm, or as a smooth function estimated nonparametrically.
Every term in the model is described by such a term formula,
and the final model is built up by selecting a component from each formula.
The supplied model
object
is used as the starting model,
and hence there is the requirement that one term from each of
the term formulas be present in
formula(object)
.
This also implies that any terms in
formula(object)
not contained in
any of the term formulas will be
forcedto be present in every model considered.
While
step.glm
uses score-test approximations to
speed up the search,
step.gam
forgoes this speedup in
favor of greater generality.
We describe the most general setup, when
direction="both"
.
At any stage there is a current model comprising a
single term from each of the term formulas supplied in the
scope
argument.
A series of models is fitted, each corresponding to a
formula obtained by moving each of the terms one step up or
down in its regimen, relative to the formula of the current model.
If the current value for any term is at either of the
extreme ends of its regimen, only one rather than two steps can be considered.
So if there are
p
term formulas, at most
2*p - 1
models are considered.
A record is kept of all the models ever visited (hence
the
-1
above), to avoid repetition.
Once each of these models has been fitted, the
bestin terms of the AIC statistic is selected and defines the step.
The entire process is repeated until either the maximum number of steps has
been used, or until the AIC criterion can not be decreased
by any of the eligible steps.
step(gam.object, scope=list( "Age" = ~ 1 + Age + log(Age), "BP" = ~ 1 + BP + poly(BP,2) + s(BP), "Chol" = ~ s(Chol, df = 4) + s(Chol, df = 7) ))