Will compute all these residuals for an ordinal logistic model, using
as temporary binary responses dichotomizations of Y, along with the corresponding
P, the probability that Y >=q cutoff. For
type="partial"
, all
possible dichotomizations are used, and for
type="score"
, the actual
components of the first derivative of the log likelihood are used for
an ordinal model. Alternatively, specify
type="score.binary"
to use binary model score residuals but for all cutpoints of Y
(plotted only, not returned). The
score.binary
,
partial
, and perhaps
score
residuals are useful for checking the proportional odds assumption.
If the option
pl=TRUE
is used to plot the
score
or
score.binary
residuals,
a score residual plot is
made for each column of the design (predictor) matrix, with
Y
cutoffs on the
x-axis and the mean +- 1.96 standard errors of the score residuals on
the y-axis. You can instead use a box plot to display these residuals,
for both
score.binary
and
score
.
Proportional odds dictates a horizontal
score.binary
plot. Partial
residual plots use smooth nonparametric estimates, separately for each
cutoff of Y. One examines that plot for parallelism of the curves
to check the proportional odds assumption, as
well as to see if the predictor behaves linearly.
Also computes a variety of influence statistics and the le Cessie - van Houwelingen - Copas - Hosmer unweighted sum of squares test for global goodness of fit, done separately for each cutoff of Y in the case of an ordinal model.
The
plot.lrm.partial
function computes partial residuals for a series
of binary logistic model fits that all used the same predictors and that
specified
x=TRUE, y=TRUE
. It then computes smoothed partial residual
relationships (using
lowess
with
iter=0
) and plots them separately
for each predictor, with residual plots from all model fits shown on the
same plot for that predictor.
## S3 method for class 'lrm': residuals(object, type=c("ordinary", "score", "score.binary", "pearson", "deviance", "pseudo.dep", "partial", "dfbeta","dfbetas","dffit","dffits","hat","gof","lp1"), pl=FALSE, xlim, ylim, kint, label.curves=TRUE, which, ...) plot.lrm.partial(..., labels, center=FALSE)
lrm
residuals
, applies to
type="partial"
when
pl
is not
FALSE
. These are extra arguments passed to the smoothing
function. Can also be used to pass extra arguments to
boxplot
for
type="score"
or
"score.binary"
.
For
plot.lrm.partial
this specifies a series of binary model fit
objects.
type="lp1"
to get approximate leave-out-1
linear predictors, derived by subtracting the
dffit
from the original
linear predictor values.
type="partial"
,
"score"
, and
"score.binary"
.
For score residuals in an ordinal model, set
pl=TRUE
to get means and
approximate 0.95 confidence bars
vs. Y, separately for each X. Alternatively, specify
pl="boxplot"
to use
boxplot
to
draw the plot, with notches and with width proportional to the square
root of the cell sizes.
For partial residuals, set
pl=TRUE
(which uses
lowess
) or
pl="supsmu"
to get smoothed partial
residual plots for all columns of X using
supsmu
.
Use
pl="loess"
to use
loess
and get confidence bands (
"loess"
is not
implemented for ordinal responses). Under R,
pl="loess"
uses
lowess
and does not provide confidence bands.
If there is more than one X,
you should probably use
par(mfrow=c( , ))
before calling
resid
.
Note that
pl="loess"
results in
plot.loess
being called, which
requires a large memory allocation.
score
or
score.binary
or range
of all smoothed curves for
partial
if
pl=TRUE
, or 0.1 and 0.9 quantiles of the residuals for
pl="boxplot"
.)
partial
,
score
, or
score.binary
, specifies
the intercept (and the cutoff of Y) to use for the calculations.
Specifying
kint=2
, for example, means to use Y >=q 3rd level.
FALSE
to suppress curve labels when
type="partial"
. The default,
TRUE
, causes
labcurve
to be invoked to label curves where they are most
separated.
label.curves
can be a list containing the
opts
parameter
for
labcurve
, to send options to
labcurve
, such as
tilt
. The
default for
tilt
here is
TRUE
.
type="partial","score","score.binary"
.
plot.lrm.partial
this specifies a vector of character strings
providing labels for the list of binary fits. By default, the names of
the fit objects are used as labels. The
labcurve
function is used
to label the curve with the
labels
.
plot.lrm.partial
this causes partial residuals for every model to have a mean of zero before smoothing and plotting
For the goodness-of-fit test, the le Cessie-van Houwelingen normal test statistic for the unweighted sum of squared errors (Brier score times n) is used. For an ordinal response variable, the test for predicting the probability that Y>=q j is done separately for all j (except the first). Note that the test statistic can have strange behavior (i.e., it is far too large) if the model has no predictive value.
For most of the values of
type
, you must have specified
x=TRUE, y=TRUE
to
lrm
.
There is yet no literature on interpreting score residual plots for the ordinal model. Simulations when proportional odds is satisfied have still shown a U-shaped residual plot. The series of binary model score residuals for all cutoffs of Y seems to better check the assumptions. See the last example.
type="partial","dfbeta","dfbetas","score"
),
test statistic (
type="gof"
), or a vector otherwise.
For partial residuals from an ordinal
model, the returned object is a 3-way array (rows of X by columns
of X by cutoffs of Y), and NAs deleted during the fit
are not re-inserted into the residuals. For
score.binary
, nothing
is returned.
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
Landwehr, Pregibon, Shoemaker. JASA 79:61–83, 1984.
le Cessie S, van Houwelingen JC. Biometrics 47:1267–1282, 1991.
Hosmer DW, Hosmer T, Lemeshow S, le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat in Med 16:965–980, 1997.
Copas JB. Applied Statistics 38:71–80, 1989.
set.seed(1) x1 <- runif(200, -1, 1) x2 <- runif(200, -1, 1) L <- x1^2 - .5 + x2 y <- ifelse(runif(200) <= plogis(L), 1, 0) f <- lrm(y ~ x1 + x2, x=TRUE, y=TRUE) resid(f) #add rows for NAs back to data resid(f, "score") #also adds back rows r <- resid(f, "partial") #for checking transformations of X's par(mfrow=c(1,2)) for(i in 1:2) { xx <- if(i==1)x1 else x2 if(.R.) { plot(xx, r[,i], xlab=c('x1','x2')[i]) lines(lowess(xx,r[,i])) } else { g <- loess(r[,i] ~ xx) plot(g, coverage=0.95, confidence=7) points(xx, r[,i]) } } resid(f, "partial", pl="loess") #same as last 3 lines resid(f, "partial", pl=TRUE) #plots for all columns of X using supsmu resid(f, "gof") #global test of goodness of fit lp1 <- resid(f, "lp1") #approx. leave-out-1 linear predictors -2*sum(y*lp1 + log(1-plogis(lp1))) #approx leave-out-1 deviance #formula assumes y is binary # Simulate data from a population proportional odds model set.seed(1) n <- 400 age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) L <- .05*(age-50) + .03*(blood.pressure-120) p12 <- plogis(L) # Pr(Y>=1) p2 <- plogis(L-1) # Pr(Y=2) p <- cbind(1-p12, p12-p2, p2) # individual class probabilites # Cumulative probabilities: cp <- matrix(cumsum(t(p)) - rep(0:(n-1), rep(3,n)), byrow=TRUE, ncol=3) # simulate multinomial with varying probs: y <- (cp < runif(n)) %*% rep(1,3) # Thanks to Dave Krantz for this trick f <- lrm(y ~ age + blood.pressure, x=TRUE, y=TRUE) par(mfrow=c(2,2)) resid(f, 'score.binary', pl=TRUE) #plot score residuals resid(f, 'partial', pl=TRUE) #plot partial residuals resid(f, 'gof') #test GOF for each level separately # Make a series of binary fits and draw 2 partial residual plots # f1 <- lrm(y>=1 ~ age + blood.pressure, x=TRUE, y=TRUE) f2 <- update(f1, y==2 ~.) par(mfrow=c(2,1)) plot.lrm.partial(f1, f2) # Simulate data from both a proportional odds and a non-proportional # odds population model. Check how 3 kinds of residuals detect # non-prop. odds set.seed(71) n <- 400 x <- rnorm(n) par(mfrow=c(2,3)) for(j in 1:2) { # 1: prop.odds 2: non-prop. odds if(j==1) L <- matrix(c(1.4,.4,-.1,-.5,-.9),nrow=n,ncol=5,byrow=TRUE) + x/2 else { # Slopes and intercepts for cutoffs of 1:5 : slopes <- c(.7,.5,.3,.3,0) ints <- c(2.5,1.2,0,-1.2,-2.5) L <- matrix(ints,nrow=n,ncol=5,byrow=TRUE)+ matrix(slopes,nrow=n,ncol=5,byrow=TRUE)*x } p <- plogis(L) if(!.R.) dim(p) <- dim(L) # Cell probabilities p <- cbind(1-p[,1],p[,1]-p[,2],p[,2]-p[,3],p[,3]-p[,4],p[,4]-p[,5],p[,5]) # Cumulative probabilities from left to right cp <- matrix(cumsum(t(p)) - rep(0:(n-1), rep(6,n)), byrow=TRUE, ncol=6) y <- (cp < runif(n)) %*% rep(1,6) f <- lrm(y ~ x, x=TRUE, y=TRUE) for(cutoff in 1:5)print(lrm(y>=cutoff ~ x)$coef) print(resid(f,'gof')) resid(f, 'score', pl=TRUE) # Note that full ordinal model score residuals exhibit a # U-shaped pattern even under prop. odds ti <- if(j==2) 'Non-Proportional Odds\nSlopes=.7 .5 .3 .3 0' else 'True Proportional Odds\nOrdinal Model Score Residuals' title(ti) resid(f, 'score.binary', pl=TRUE) if(j==1) ti <- 'True Proportional Odds\nBinary Score Residuals' title(ti) resid(f, 'partial', pl=TRUE) if(j==1) ti <- 'True Proportional Odds\nPartial Residuals' title(ti) } par(mfrow=c(1,1)) # Get data used in Hosmer et al. paper and reproduce their calculations if(FALSE && .R.) { v <- Cs(id, low, age, lwt, race, smoke, ptl, ht, ui, ftv, bwt) d <- read.table("http://www-unix.oit.umass.edu/~statdata/data/lowbwt.dat", skip=6, col.names=v) d <- upData(d, race=factor(race,1:3,c('white','black','other'))) f <- lrm(low ~ age + lwt + race + smoke, data=d, x=TRUE,y=TRUE) f resid(f, 'gof') # Their Table 7 Line 2 found sum of squared errors=36.91, expected # value under H0=36.45, variance=.065, P=.071 # We got 36.90, 36.45, SD=.26055 (var=.068), P=.085 # Note that two logistic regression coefficients differed a bit # from their Table 1 }