lrm
or
ols
and for a vector or list of penalties,
fits a series of logistic or linear models using penalized maximum likelihood
estimation, and saves the effective degrees of freedom, Akaike Information
Criterion (AIC), Schwarz Bayesian Information Criterion (BIC), and
Hurvich and Tsai's corrected AIC (AIC_c). Optionally
pentrace
can
use the
nlminb
function to solve for the optimum penalty factor or
combination of factors penalizing different kinds of terms in the model.
The
effective.df
function prints the original and effective
degrees of freedom for a penalized fit or for an unpenalized fit and
the best penalization determined from a previous invocation of
pentrace
if
method="grid"
(the default).
The effective d.f. is computed separately for each class of terms in
the model (e.g., interaction, nonlinear).
A
plot
method exists to plot the results, and a
print
method exists
to print the most pertinent components. Both AIC and BIC
may be plotted if
there is only one penalty factor type specified in
penalty
. Otherwise,
the first two types of penalty factors are plotted, showing only the AIC.
pentrace(fit, penalty, penalty.matrix, method=c('grid','optimize'), which=c('aic.c','aic','bic'), target.df, fitter, pr=FALSE, tol=1e-7, keep.coef=FALSE, complex.more=TRUE, verbose=FALSE, maxit=12, subset) effective.df(fit, object) ## S3 method for class 'pentrace': print(x, ...) ## S3 method for class 'pentrace': plot(x, method=c('points','image'), which=c('effective.df','aic','aic.c','bic'), pch=2, add=FALSE, ylim, ...)
lrm
or
ols
with
x=TRUE, y=TRUE
and without using
penalty
or
penalty.matrix
(or optionally using penalization in the case of
effective.df
)
penalty
, with a penalty of zero automatically added.
penalty
can
also be a list in the format documented in the
lrm
function, except that
elements of the list can be vectors. The
expand.grid
function is
invoked by
pentrace
to generate all possible combinations of
penalties. For example, specifying
penalty=list(simple=1:2, nonlinear=1:3)
will generate 6 combinations
to try, so that the analyst can attempt to determine whether penalizing
more complex terms in the model more than the linear or categorical
variable terms will be beneficial. If
complex.more=TRUE
, it is assumed
that the variables given in
penalty
are listed in order from less
complex to more complex. With
method="optimize"
penalty
specifies
an initial guess for the penalty or penalties. If all term types are
to be equally penalized,
penalty
should be a single number,
otherwise it should be a list containing single numbers as elements,
e.g.,
penalty=list(simple=1, nonlinear=2)
. Experience has shown that the optimization algorithm is more likely to find a reasonable solution when the starting value specified in
penalty
is too large rather than too small.
pentrace
. For
effective.df
,
object
can be
omitted if the
fit
was penalized.
lrm
method="grid"
to print various indexes for all
combinations of penalty parameters given by the user. Specify
method="optimize"
to have
pentrace
use
nlminb
to solve for the
combination of penalty parameters that gives the maximum value of the
objective named in
which
, or, if
target.df
is given, to find the
combination that yields
target.df
effective total degrees of freedom
for the model. When
target.df
is specified,
method
is set to
"optimize"
automatically.
For
plot.pentrace
this parameter applies only if more than one
penalty term-type was used. The default is to use open triangles
whose sizes are proportional to the ranks of the AICs, plotting the
first two penalty factors respectively on the x and y axes. Use
method="image"
to plot an image plot.
method
. Default is
"aic.c"
(corrected
AIC).
For
plot.pentrace
,
which
is a vector of names of criteria to show;
default is to plot all 4 types, with effective d.f. in its own separate plot
method="optimize"
. See
method
.
target.df
makes
sense mainly when a single type of penalty factor is specified.
lrm.fit
(
lm.pfit
is always used for
ols
).
TRUE
to print intermediate results
lrm.fit, solvet
)
TRUE
to store matrix of regression coefficients for all the fits (corresponding
to increasing values of
penalty
) in object
Coefficients
in the
returned list. Rows correspond to penalties, columns to regression
parameters.
penalty
is a list, combinations of penalties for which
complex terms are penalized less than less complex terms will be
dropped after
expand.grid
is invoked. Set
complex.more=FALSE
to
allow more complex terms to be penalized less. Currently this option
is ignored for
method="optimize"
.
TRUE
to print number of intercepts and sum
of effective degrees of freedom
maxit
if you had to when fitting the
original unpenalized model.
pentrace
to see if the best penalty can be estimated
with little error so that variation due to selecting the optimal
penalty can
be safely ignored when bootstrapping standard errors of regression
coefficients and measures of predictive accuracy. See an example below.
pentrace
method="points"
TRUE
to add to an existing plot. In that case, the effective
d.f. plot is not re-drawn, but the AIC/BIC plot is added to.
plot
,
lines
, or
image
"pentrace"
with elements
penalty, df, objective, fit, var.adj, diag, results.all
, and
optionally
Coefficients
.
The first 6 elements correspond to the fit that had the best objective
as named in the
which
argument, from the sequence of fits tried.
Here
fit
is the fit object from
fitter
which was a penalized fit,
diag
is the diagonal of the matrix used to compute the effective
d.f., and
var.adj
is Gray (1992) Equation 2.9, which is an improved
covariance matrix for the penalized beta.
results.all
is a data
frame whose first few variables are the components of
penalty
and
whose other columns are
df, aic, bic, aic.c
.
results.all
thus
contains a summary of results for all fits attempted. When
method="optimize"
, only two components are returned:
penalty
and
objective
, and the object does not have a class.
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
Gray RJ: Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. JASA 87:942–951, 1992.
Hurvich CM, Tsai, CL: Regression and time series model selection in small samples. Biometrika 76:297–307, 1989.
n <- 1000 # define sample size set.seed(17) # so can reproduce the results age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) cholesterol <- rnorm(n, 200, 25) sex <- factor(sample(c('female','male'), n,TRUE)) # Specify population model for log odds that Y=1 L <- .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y <- ifelse(runif(n) < plogis(L), 1, 0) f <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) p <- pentrace(f, seq(.2,1,by=.05)) plot(p) p$diag # may learn something about fractional effective d.f. # for each original parameter pentrace(f, list(simple=c(0,.2,.4), nonlinear=c(0,.2,.4,.8,1))) # Bootstrap pentrace 5 times, making a plot of corrected AIC plot with 5 reps n <- nrow(f$x) plot(pentrace(f, seq(.2,1,by=.05)), which='aic.c', col=1, ylim=c(30,120)) #original in black for(j in 1:5) plot(pentrace(f, seq(.2,1,by=.05), subset=sample(n,n,TRUE)), which='aic.c', col=j+1, add=TRUE) # Find penalty giving optimum corrected AIC. Initial guess is 1.0 if(!.R.) pentrace(f, 1, method='optimize') # Find penalty reducing total regression d.f. effectively to 5 if(!.R.) pentrace(f, 1, target.df=5) # Re-fit with penalty giving best aic.c without differential penalization f <- update(f, penalty=p$penalty) effective.df(f)