cr.setup
for how to
fit forward continuation ratio models with
lrm
.
lrm(formula, data, subset, na.action=na.delete, method="lrm.fit", model=FALSE, x=FALSE, y=FALSE, linear.predictors=TRUE, se.fit=FALSE, penalty=0, penalty.matrix, tol=1e-7, strata.penalty=0, var.penalty=c('simple','sandwich'), weights, normwt, ...)
offset
term can be included. The offset causes
fitting of a model such as logit(Y=1) = Xβ + W, where W is the
offset variable having no estimated coefficient.
The response variable can be any data type;
lrm
converts it
in alphabetic or numeric order to an S factor variable and
recodes it 0,1,2,... internally.
NA
s in the data. Default is
na.delete
, which
deletes any observation having response or predictor missing, while
preserving the attributes of the predictors and maintaining frequencies
of deletions due to each variable in the model.
This is usually specified using
options(na.action="na.delete")
.
lrm.fit
.
x
.
y
.
linear.predictors
. When the response variable has
more than two levels, only the first intercept is used.
se.fit
.
penalty
factors * penalty.matrix
and
penalty.matrix
is
defined below. The default is
penalty=0
implying that ordinary
unpenalized maximum likelihood estimation is used.
If
penalty
is a scalar, it is assumed to be a penalty factor that
applies
to all non-intercept parameters in the model. Alternatively, specify a
list to penalize different types of model terms by differing amounts.
The elements in this list are named
simple, nonlinear, interaction
and
nonlinear.interaction
. If you omit elements on the right of this
series, values are inherited from elements on the left. Examples:
penalty=list(simple=5, nonlinear=10)
uses a penalty factor of 10
for nonlinear or interaction terms.
penalty=list(simple=0, nonlinear=2, nonlinear.interaction=4)
does not
penalize linear main effects, uses a penalty factor of 2 for nonlinear or
interaction effects (that are not both), and 4 for nonlinear interaction
effects.
penalty.matrix
, you may set
the rows and columns for certain parameters to zero so as to not
penalize those parameters.
Depending on
penalty
, some elements of
penalty.matrix
may
be overridden automatically by setting them to zero.
The penalty matrix that is used in the actual fit is
penalty times diag(pf) times penalty.matrix times diag(pf),
where pf is the vector
of square roots of penalty factors computed from
penalty
by
Penalty.setup
in
Design.Misc
. If you specify
penalty.matrix
you must specify a nonzero value of
penalty
or no penalization will be
done.
lrm.fit
)
strat
variable
var
component of the fit when penalization is used. The default is the
inverse of the penalized information matrix. Specify
var.penalty="sandwich"
to use the sandwich estimator (see below
under
var
), which limited simulation studies have shown yields
variances estimates that are too low.
y
) of possibly fractional case weights
weights
so they sum to the length of
y
; useful for sample surveys as opposed to the default of
frequency weighting
lrm.fit
.
lrm
contains the following components in addition
to the ones mentioned under the optional arguments.
Y
in order of increasing
Y
"Model L.R."
is computed
without the penalty factor, and
"d.f."
is the effective d.f. from
Gray's (1992) Equation 2.9.
The P-value uses this corrected model
L.R. chi-square and corrected d.f.
The score chi-square statistic uses first derivatives which contain
penalty components.
TRUE
if convergence failed (and
maxiter>1
)
penalty>0
,
var
is either the inverse of the penalized
information matrix (the default, if
var.penalty="simple"
) or the
sandwich-type variance - covariance
matrix estimate (Gray Eq. 2.6) if
var.penalty="sandwich"
. For the
latter case the simple information-matrix - based variance
matrix is returned under the name
var.from.info.matrix
.
penalty>0
. It is the vector whose sum is the effective
d.f. of the model (counting intercept terms).
X
fitted (intercepts are not counted)
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
Le Cessie S, Van Houwelingen JC: Ridge estimators in logistic regression. Applied Statistics 41:191–201, 1992.
Verweij PJM, Van Houwelingen JC: Penalized likelihood in Cox regression. Stat in Med 13:2427–2436, 1994.
Gray RJ: Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. JASA 87:942–951, 1992.
Shao J: Linear model selection by cross-validation. JASA 88:486–494, 1993.
Verweij PJM, Van Houwelingen JC: Crossvalidation in survival analysis. Stat in Med 12:2305–2314, 1993.
Harrell FE: Model uncertainty, penalization, and parsimony. ISCB Presentation on UVa Web page, 1998.
#Fit a logistic model containing predictors age, blood.pressure, sex #and cholesterol, with age fitted with a smooth 5-knot restricted cubic #spline function and a different shape of the age relationship for males #and females. # n <- 1000 # define sample size set.seed(17) # so can reproduce the results age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) cholesterol <- rnorm(n, 200, 25) sex <- factor(sample(c('female','male'), n,TRUE)) label(age) <- 'Age' # label is in Hmisc label(cholesterol) <- 'Total Cholesterol' label(blood.pressure) <- 'Systolic Blood Pressure' label(sex) <- 'Sex' units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc units(blood.pressure) <- 'mmHg' # Specify population model for log odds that Y=1 L <- .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y <- ifelse(runif(n) < plogis(L), 1, 0) cholesterol[1:3] <- NA # 3 missings, at random ddist <- datadist(age, blood.pressure, cholesterol, sex) options(datadist='ddist') fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) # x=TRUE, y=TRUE allows use of resid(), which.influence below # could define d <- datadist(fit) after lrm(), but data distribution # summary would not be stored with fit, so later uses of plot.Design # or summary.Design would require access to the original dataset or # d or specifying all variable values to summary, plot, nomogram anova(fit) plot(fit, age=NA, sex=NA) plot(fit, age=20:70, sex="male") # need if datadist not used print(cbind(resid(fit,"dfbetas"), resid(fit,"dffits"))[1:20,]) which.influence(fit, .3) # latex(fit) #print nice statement of fitted model # #Repeat this fit using penalized MLE, penalizing complex terms #(for nonlinear or interaction effects) # fitp <- update(fit, penalty=list(simple=0,nonlinear=10), x=TRUE, y=TRUE) effective.df(fitp) # or lrm(y ~ ..., penalty=...) #Get fits for a variety of penalties and assess predictive accuracy #in a new data set. Program efficiently so that complex design #matrices are only created once. set.seed(201) x1 <- rnorm(500) x2 <- rnorm(500) x3 <- sample(0:1,500,rep=TRUE) L <- x1+abs(x2)+x3 y <- ifelse(runif(500)<=plogis(L), 1, 0) new.data <- data.frame(x1,x2,x3,y)[301:500,] # for(penlty in seq(0,.15,by=.005)) { if(penlty==0) { f <- lrm(y ~ rcs(x1,4)+rcs(x2,6)*x3, subset=1:300, x=TRUE, y=TRUE) # True model is linear in x1 and has no interaction X <- f$x # saves time for future runs - don't have to use rcs etc. Y <- f$y # this also deletes rows with NAs (if there were any) penalty.matrix <- diag(diag(var(X))) Xnew <- predict(f, new.data, type="x", incl.non.slopes=FALSE) # expand design matrix for new data Ynew <- new.data$y } else f <- lrm.fit(X,Y, penalty.matrix=penlty*penalty.matrix) # cat("\nPenalty :",penlty,"\n") pred.logit <- f$coef[1] + (Xnew %*% f$coef[-1]) pred <- plogis(pred.logit) C.index <- somers2(pred, Ynew)["C"] Brier <- mean((pred-Ynew)^2) Deviance<- -2*sum( Ynew*log(pred) + (1-Ynew)*log(1-pred) ) cat("ROC area:",format(C.index)," Brier score:",format(Brier), " -2 Log L:",format(Deviance),"\n") } #penalty=0.045 gave lowest -2 Log L, Brier, ROC in test sample for S+ # #Use bootstrap validation to estimate predictive accuracy of #logistic models with various penalties #To see how noisy cross-validation estimates can be, change the #validate(f, ...) to validate(f, method="cross", B=10) for example. #You will see tremendous variation in accuracy with minute changes in #the penalty. This comes from the error inherent in using 10-fold #cross validation but also because we are not fixing the splits. #20-fold cross validation was even worse for some #indexes because of the small test sample size. Stability would be #obtained by using the same sample splits for all penalty values #(see above), but then we wouldn't be sure that the choice of the #best penalty is not specific to how the sample was split. This #problem is addressed in the last example. # penalties <- seq(0,.7,by=.1) # really use by=.02 index <- matrix(NA, nrow=length(penalties), ncol=9, dimnames=list(format(penalties), c("Dxy","R2","Intercept","Slope","Emax","D","U","Q","B"))) i <- 0 for(penlty in penalties) { cat(penlty, "") i <- i+1 if(penlty==0) { f <- lrm(y ~ rcs(x1,4)+rcs(x2,6)*x3, x=TRUE, y=TRUE) # fit whole sample X <- f$x Y <- f$y penalty.matrix <- diag(diag(var(X))) # save time - only do once } else f <- lrm(Y ~ X, penalty=penlty, penalty.matrix=penalty.matrix, x=TRUE,y=TRUE) val <- validate(f, method="boot", B=20) # use larger B in practice index[i,] <- val[,"index.corrected"] } par(mfrow=c(3,3)) for(i in 1:9) { plot(penalties, index[,i], xlab="Penalty", ylab=dimnames(index)[[2]][i]) lines(lowess(penalties, index[,i])) } options(datadist=NULL) # Example of weighted analysis x <- 1:5 y <- c(0,1,0,1,0) reps <- c(1,2,3,2,1) lrm(y ~ x, weights=reps) x <- rep(x, reps) y <- rep(y, reps) lrm(y ~ x) # same as above # #Study performance of a modified AIC which uses the effective d.f. #See Verweij and Van Houwelingen (1994) Eq. (6). Here AIC=chisq-2*df. #Also try as effective d.f. equation (4) of the previous reference. #Also study performance of Shao's cross-validation technique (which was #designed to pick the "right" set of variables, and uses a much smaller #training sample than most methods). Compare cross-validated deviance #vs. penalty to the gold standard accuracy on a 7500 observation dataset. #Note that if you only want to get AIC or Schwarz Bayesian information #criterion, all you need is to invoke the pentrace function. #NOTE: the effective.df( ) function is used in practice # ## Not run: for(seed in c(339,777,22,111,3)){ # study performance for several datasets set.seed(seed) n <- 175; p <- 8 X <- matrix(rnorm(n*p), ncol=p) # p normal(0,1) predictors Coef <- c(-.1,.2,-.3,.4,-.5,.6,-.65,.7) # true population coefficients L <- X %*% Coef # intercept is zero Y <- ifelse(runif(n)<=plogis(L), 1, 0) pm <- diag(diag(var(X))) #Generate a large validation sample to use as a gold standard n.val <- 7500 X.val <- matrix(rnorm(n.val*p), ncol=p) L.val <- X.val %*% Coef Y.val <- ifelse(runif(n.val)<=plogis(L.val), 1, 0) # Penalty <- seq(0,30,by=1) reps <- length(Penalty) effective.df <- effective.df2 <- aic <- aic2 <- deviance.val <- Lpenalty <- single(reps) n.t <- round(n^.75) ncv <- c(10,20,30,40) # try various no. of reps in cross-val. deviance <- matrix(NA,nrow=reps,ncol=length(ncv)) #If model were complex, could have started things off by getting X, Y #penalty.matrix from an initial lrm fit to save time # for(i in 1:reps) { pen <- Penalty[i] cat(format(pen),"") f.full <- lrm.fit(X, Y, penalty.matrix=pen*pm) Lpenalty[i] <- pen* t(f.full$coef[-1]) %*% pm %*% f.full$coef[-1] f.full.nopenalty <- lrm.fit(X, Y, initial=f.full$coef, maxit=1) info.matrix.unpenalized <- solve(f.full.nopenalty$var) effective.df[i] <- sum(diag(info.matrix.unpenalized %*% f.full$var)) - 1 lrchisq <- f.full.nopenalty$stats["Model L.R."] # lrm does all this penalty adjustment automatically (for var, d.f., # chi-square) aic[i] <- lrchisq - 2*effective.df[i] # pred <- plogis(f.full$linear.predictors) score.matrix <- cbind(1,X) * (Y - pred) sum.u.uprime <- t(score.matrix) %*% score.matrix effective.df2[i] <- sum(diag(f.full$var %*% sum.u.uprime)) aic2[i] <- lrchisq - 2*effective.df2[i] # #Shao suggested averaging 2*n cross-validations, but let's do only 40 #and stop along the way to see if fewer is OK dev <- 0 for(j in 1:max(ncv)) { s <- sample(1:n, n.t) cof <- lrm.fit(X[s,],Y[s], penalty.matrix=pen*pm)$coef pred <- cof[1] + (X[-s,] %*% cof[-1]) dev <- dev -2*sum(Y[-s]*pred + log(1-plogis(pred))) for(k in 1:length(ncv)) if(j==ncv[k]) deviance[i,k] <- dev/j } # pred.val <- f.full$coef[1] + (X.val %*% f.full$coef[-1]) prob.val <- plogis(pred.val) deviance.val[i] <- -2*sum(Y.val*pred.val + log(1-prob.val)) } postscript(hor=TRUE) # along with graphics.off() below, allow plots par(mfrow=c(2,4)) # to be printed as they are finished plot(Penalty, effective.df, type="l") lines(Penalty, effective.df2, lty=2) plot(Penalty, Lpenalty, type="l") title("Penalty on -2 log L") plot(Penalty, aic, type="l") lines(Penalty, aic2, lty=2) for(k in 1:length(ncv)) { plot(Penalty, deviance[,k], ylab="deviance") title(paste(ncv[k],"reps")) lines(supsmu(Penalty, deviance[,k])) } plot(Penalty, deviance.val, type="l") title("Gold Standard (n=7500)") title(sub=format(seed),adj=1,cex=.5) graphics.off() } ## End(Not run) #The results showed that to obtain a clear picture of the penalty- #accuracy relationship one needs 30 or 40 reps in the cross-validation. #For 4 of 5 samples, though, the super smoother was able to detect #an accurate penalty giving the best (lowest) deviance using 10-fold #cross-validation. Cross-validation would have worked better had #the same splits been used for all penalties. #The AIC methods worked just as well and are much quicker to compute. #The first AIC based on the effective d.f. in Gray's Eq. 2.9 #(Verweij and Van Houwelingen (1994) Eq. 5 (note typo)) worked best.