perimeter
is a function used to generate the boundary of data to plot
when a 3-d plot is made. It finds the area where there are sufficient
data to generate believable interaction fits.
Legend
is a generic function for adding legends to an existing graph
according to the specific plot made by
plot.Design
. The specific
Legend
method for
plot.Design
is
Legend.plot.Design
. It handles
legends for
image
plots. For other plots with one or more curves,
make legends using the
label.curves
parameter.
datadensity
is a function for showing the data density (raw data) on
each curve generated for curve-type plots. This is a rug plot showing
the location/density of data values for the x-axis variable. If
there was a second variable specified to
plot
that generated separate
curves, the data density specific to each class of points is shown.
This assumes that the second variable was categorical. The rug plots
are drawn by
scat1d
.
To plot effects instead of estimates (e.g., treatment differences as a
function of interacting factors) see
contrast.Design
and
summary.Design
.
perimeter(x, y, xinc=diff(range(x))/10, n=10, lowess.=TRUE) ## S3 method for class 'Design': plot(x, ..., xlim, ylim, fun, xlab, ylab, conf.int=.95, conf.type=c('mean','individual'), add=FALSE, label.curves=TRUE, eye, theta=0, phi=15, perspArgs=NULL, lty, col=1, lwd=par('lwd'), lwd.conf=1, pch=1, adj.zero=FALSE, ref.zero=FALSE, adj.subtitle, cex.adj, non.slopes, time=NULL, loglog=FALSE, val.lev=FALSE, digits=4, log="", perim, method=c("persp","contour","image","dotchart","default"), sortdot=c('neither','ascending','descending'), nlevels=10, name, zlim=range(zmat,na.rm=TRUE), vnames=c('labels','names'), abbrev=FALSE) # Just say plot(fit, ...) ## S3 method for class 'plot.Design': print(x, ...) ## S3 method for class 'perimeter': lines(x, ...) Legend(object, ...) ## S3 method for class 'plot.Design': Legend(object, x, y, size=c(1,1), horizontal=TRUE, nint=50, fun, at, zlab, ...) ## S3 method for class 'plot.Design': datadensity(object, x1, x2, ...)
Design()
in effect.
options(datadist="d")
must have been specified (where
d
was created by
datadist
), or
it must have been in effect with
fit
was created.
x=NA
to use the
default display range, or any range you choose (e.g.
seq(0,100,by=2),c(2,3,7,14)
).
The default list of values for which predictions are made
is taken as the list of unique values of the variable if they number fewer
than 11. For variables with >10 unique values, 100 equally spaced
values in the range are used for plotting if the range is not specified.
If there is a second variable listed, and its range is
NA
or a single value,
that variable is displayed on the y-axis. If the second variable's range
has fewer than 40 levels, separate curves are generated for each value
of the variable. Otherwise, a three dimensional perspective plot is drawn
using 40 equally-spaced values of
y
. Names may be abbreviated.
plot
senses that a variable is not to be displayed by checking if the list of values for
the variable is a scalar instead of a vector. Variables not specified are set to the default
adjustment value
limits[2]
, i.e. the median for continuous variables and a reference category for
non-continuous ones. Due to a bug in S, the first variable mentioned
may not be named
x
. This would cause the general scatterplot function
plot
to be invoked by mistake.
Variables after the first or second specified to
plot
define adjustment settings.
For categorical variables, specify the class labels in quotes when specifying variable values. If the levels of a categorical variable are numeric,
you may omit the quotes. For variables not described using
datadist
,
you must specify explicit ranges and adjustment settings for predictors
that were in the model. Note that you can omit
variables
entirely. In
that case, all non-interaction effects will be plotted automatically as
if you said
plot(fit, age=NA); plot(fit, sex=NA); ...
. In this case
you have no control over the settings of the variables for the x-axis,
i.e.,
NA
is always assumed.
For a plot made up of multiple curves, these are extra graphical arguments
will be passed to
key
from
Legend
. For
image
plots, these
arguments are passed to
par
and have temporary effect.
For
datadensity
these extra arguments are passed along to
scat1d
.
perim
. For
Legend
, is either a vector of 1 or 2
x-coordinates or a list with elements
x
and
y
each
with 1 or 2 coordinates. For
method="image"
plots, 1 or 2
coordinates may be given, and for other plot types, 1 coordinate is
given. A single coordinate represents the upper left corner of the
legend box. For
Legend
,
x
and
y
are optional. If
omitted,
locator
is used to position legends with the mouse.
For
lines.perimeter
,
x
is the result of
perimeter
.
For
print.plot.Design
,
x
is the result of
plot.Design
.
perim
, or y-coordinates for
Legend
. If omitted,
x
is assumed to be a list with both
x
and
y
components.
x
over which to examine the density of
y
in
perimeter
x
for
perimeter
, takes the informative range of
y
to be
the nth smallest to the nth largest values of
y
. If there aren't
at least 2n
y
values in the
x
interval, no
y
ranges are used
for that interval.
FALSE
to not have
lowess
smooth the data perimeters
variables
specifications. One reason to use
xlim
is to plot a
factor
variable on the x-axis that was created with the
cut2
function
with the
levels.mean
option, with
val.lev=TRUE
specified to
plot.Design
.
In this case you may want the axis to
have the range of the original variable values given to
cut2
rather
than the range of the means within quantile groups.
fun=function(x)1/(1+exp(-x))
or
fun=plogis
,
and to take the anti-log, specify
fun=exp
.
for
Legend
,
fun
is
a function for transforming tick mark labels for color or gray scale
legends for
method="image"
. For example, if
plot.Design
is used
to make an image plot of log odds ratios, specifying
fun=plogis
will
cause the color legend to be labeled with probability values rather
than log odds.
x
-axis. Default is one given to
asis, rcs
, etc., which may have been
the
"label"
attribute of the variable.
y
-axis (
z
-axis if perspective plot). If
fun
is not given,
default is
"log Odds"
for
lrm
,
"log Relative Hazard"
for
cph
, name of the response
variable for
ols
,
TRUE
or
log(TRUE)
for
psm
, or
"X * Beta"
otherwise.
If
fun
is given, the default is
""
.
If
time
is given, the default is
"(time) (units) Survival Probability"
or
"log[-log S(time)]"
depending on the
loglog
parameter.
.95
. Specify
FALSE
to suppress confidence bands.
ols
fits there is the option of obtaining confidence limits for
individual predicted values by specifying
conf.type="individual"
.
TRUE
to add to an existing plot without drawing new axes. Default is
FALSE
.
See the warning note under
sortdot
.
FALSE
to suppress labeling of separate curves.
Default is
TRUE
, which
causes
labcurve
to be invoked to place labels at positions where the
curves are most separated, labeling each curve with the full curve label.
Set
label.curves
to a
list
to specify options to
labcurve
, e.g.,
label.curves=
list(method="arrow", cex=.8)
.
These option names may be abbreviated in the usual way arguments
are abbreviated. Use for example
label.curves=list(keys=letters[1:5])
to draw single lower case letters on 5 curves where they are most
separated, and automatically position a legend
in the most empty part of the plot. The
col
,
lty
, and
lwd
parameters
are passed automatically to
labcurve
although they may be overridden
here.
persp
function for defining perspective in 3-d plots.
Default is
(-6, -6, 9)
. This is for S-PLUS only.
persp
par("lwd")
.
method="dotchart"
. Default is 1
(open circle). See
points
for other values, or use the
show.pch
function in Hmisc.
1
(black). Specify
a vector of integers to assign different colors to different curves.
FALSE
to suppress subtitling the graph with the list of settings of non-graphed adjustment values. Default is
TRUE
if
<= 6
non-plotted factors.
cex
parameter for size of adjustment settings in subtitles. Default is
0.75 times
par("cex")
.
TRUE
to adjust all non-plotted variables to 0 (or reference cell for
categorical variables) and to omit intercept(s) from consideration. Default
is
FALSE
.
x
-variable yields
y=0
. This is
done before applying function
fun
.
non.slopes=c(0,1,0)
. The default is
non.slopes=rep(0,k)
if
adj.zero=TRUE
, where
k
is the number of intercepts in the model.
If
adj.zero=FALSE
, the default is
(1,0,0,...,0)
.
u
to cause function
survest
to be invoked
to plot the probability of surviving until time
u
when the fit
is from
cph
or
psm
.
loglog=TRUE
to plot
log[-log(survival)]
instead of survival,
when
time
is given.
TRUE
to use these values in
plotting. An ordinary axis with uniform spacing will be used rather than
spacing dictated by the value labels. When
val.lev=FALSE
, category
labels dictate how axis tick marks are made.
val.lev
is used
typically when the variable being plotted is a categorical variable
that was collapsed into intervals, with the value label for a category
representing interval means or midpoints. Such variables are created
for example by the
cut2
function, specifying
levels.mean=TRUE
. For
plotting a discrete numeric variable you can specify
val.lev=TRUE
to
force plotting of the variable as if it were continuous.
log="x", "y"
or
"xy"
to plot log scales on one or both axes.
perimeter
when used for 3-d plots of
two continuous predictors. When the combination of variables is outside
the range in
perim
, that section of the plot is suppressed. If
perim
is omitted, 3-d plotting will use the marginal distributions of the
two predictors to determine the plotting region, when the grid is
not specified explicitly in
variables
. When instead a series of
curves is being plotted,
perim
specifies a function having two
arguments. The first is the vector of values of the first variable that
is about to be plotted on the x-axis. The second argument is the single
value of the variable representing different curves, for the current
curve being plotted. The function's returned value must be a logical
vector whose length is the same as that of the first argument, with
values
TRUE
if the corresponding point should be plotted for the
current curve,
FALSE
otherwise. See one of the latter examples.
method="persp"
for perspective plots (
persp()
,
the default),
method="contour"
to use
contour()
, or
method="image"
to use
image()
. Specify
method="dotchart"
to make a horizontal dot
chart to represent predicted values associated with categorical predictors.
The
log
argument does not apply to these plot types.
You can specify
method="default"
to get the default behaviour.
For
"dotchart"
, the
dotchart2
function in the Hmisc library is used.
method="dotchart"
. The default is to plot the points in
the order requested for predictions. Specify
method="ascending"
or
an abbreviation such as
method="a"
to sort in ascending order before
plotting the dot chart. You may also specify
method="descending"
.
Unless
method="neither"
, specifying
add=TRUE
may not work properly.
method="contour"
variables
list, you can specify one or more variables to plot by
specifying a vector of character string variable names in the
name
argument. Using this mode you cannot specify a list of
variable values to use; plotting is done as if you had said e.g.
age=NA
. Also, interacting factors can only be set to their reference values
using this notation.
"label"
attributes) to instead use variable names, specify
vnames="names"
.
plot.Design
TRUE
to use the
abbreviate
function to abbreviate levels of
categorical factors for labeling tick marks on the x-axis.
image.legend
fun
is specified to
Legend
,
at
may be given.
at
is a vector
of values at which to evaluate
fun
for drawing tick marks in the
color legend. For example, if you want to show the median survival time
for a log-normal survival model whereas the linear predictor (log median)
was used in constructing the image plot, and if you want to place tick
marks at nice median values, specify
fun=exp, at=log(c(1,10,100,1000))
.
image
color axis legend. Default is from the model
(e.g.,
"Log Odds"
), but
zlab
will often be specified if
fun
was specified to
plot.Design
or
Legend
.
plot
(
x
-axis variable)
plot
if it was not constant
(curve-generating variable)
When there are no intercepts in the fitted model, plot subtracts adjustment values from each factor while computing variances for confidence limits.
perimeter
is a kind of generalization of
datadist
for 2 continuous
variables. First, the
n
smallest and largest
x
values are determined.
These form the lowest and highest possible
x
s to display. Then
x
is grouped into intervals bounded by these two numbers, with the interval
widths defined by
xinc
. Within each interval,
y
is sorted and the
nth smallest and largest
y
are taken as the interval containing
sufficient data density to plot interaction surfaces. The interval
is ignored when there are insufficient
y
values. When
plot.Design
readies the data for
persp
, it uses the
approx
function to do
linear interpolation of the
y
-boundaries as a function of the
x
values actually used in forming the grid (the values of the
first variable specified to
plot
). To make the perimeter smooth,
specify
lowess.=TRUE
to
perimeter
.
Specifying
time
will not work for Cox models with time-dependent
covariables. Use
survest
or
survfit
for that purpose.
Use
ps.slide
,
win.slide
,
gs.slide
to set up nice defaults for
plotting. These also set a system option
mgp.axis.labels
to allow x
and y-axes to have differing
mgp
graphical parameters (see
par
).
This is important when labels for y-axis tick marks are to be written
horizontally (
par(las=1)
), as a larger gap between the labels and
the tick marks are needed. You can set the axis-specific 2nd
components of
mgp
using
mgp.axis.labels(c(xvalue,yvalue))
.
Note that because the generic
plot
method has the variable
x
as its first argument, you cannot explicitly specify that you
want to plot the effect of a predictor named
x
.
perimeter
returns a matrix of class
perimeter
. This outline can be
conveniently plotted by
lines.perimeter
.
Legend.plot.Design
invisibly returns the position of the legend.
plot.Design
invisibly returns an invisible object of class
"plot.Design"
with the following components (use
print.plot.Design
to get a nice printout of the object):
x
-values. If a
second variable was plotted, second column is sequence of
y
-values.
Next column is estimated X beta, followed by a column of lower confidence
limits and upper confidence limits. If fun was specified, these last three
columns are transformed by the specified function.
"sex=male age=50"
containing settings of
non-plotted factors.
y
-variable if it determined different
curves on the plot. E.g.,
c("female","male")
or
c("10","20","30")
.
This vector is useful as an argument to the S
key
function if
label.curves=FALSE
.
"curves"
or
"3d"
plot.Design
call
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
n <- 1000 # define sample size set.seed(17) # so can reproduce the results age <- rnorm(n, 50, 10) blood.pressure <- rnorm(n, 120, 15) cholesterol <- rnorm(n, 200, 25) sex <- factor(sample(c('female','male'), n,TRUE)) label(age) <- 'Age' # label is in Hmisc label(cholesterol) <- 'Total Cholesterol' label(blood.pressure) <- 'Systolic Blood Pressure' label(sex) <- 'Sex' units(cholesterol) <- 'mg/dl' # uses units.default in Hmisc units(blood.pressure) <- 'mmHg' # Specify population model for log odds that Y=1 L <- .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y <- ifelse(runif(n) < plogis(L), 1, 0) ddist <- datadist(age, blood.pressure, cholesterol, sex) options(datadist='ddist') fit <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) par(mfrow=c(2,2)) plot(fit) # Plot effects of all 4 predictors par(mfrow=c(1,2)) plot(fit, name=c('age','cholesterol')) # Make 2 plots par(mfrow=c(1,1)) plot(fit, age=seq(20,80,length=100), sex=NA, conf.int=FALSE) # Plot relationship between age and log # odds, separate curve for each sex, # no C.I. z <- plot(fit, age=NA, sex=NA, label.curves=FALSE) # use label.curves=list(keys=c('a','b'))' # to use 1-letter abbreviations datadensity(z, age, sex) # rug plots (1-dimensional scatterplots) # on each treatment curve, with treatment- # specific density of age plot(fit, age=seq(20,80,length=100), sex='male') # works if datadist not used plot(fit, age=NA, cholesterol=NA)# 3-dimensional perspective plot for age, # cholesterol, and log odds using default # ranges for both variables boundaries <- perimeter(age, cholesterol, lowess=TRUE) plot(age, cholesterol) # show bivariate data density lines(boundaries) # and perimeter that will be used for 3-D plot z <- plot(fit, age=NA, cholesterol=NA, perim=boundaries, method='image') # draws image() plot # don't show estimates where data are sparse # doesn't make sense here since vars don't interact if(!.R.)Legend(z, fun=plogis, at=qlogis(c(.01,.05,.1,.2,.3,.4,.5)), zlab='Probability') # gray scale or color legend for prob. plot(fit, age=NA, fun=function(x) 1/(1+exp(-x)) , # or fun=plogis ylab="Prob", conf.int=.9) # Plot estimated probabilities instead of # log odds # Plot the age effect as an odds ratio # comparing the age shown on the x-axis to age=30 years ddist$limits$age[2] <- 30 # make 30 the reference value for age # Could also do: ddist$limits["Adjust to","age"] <- 30 fit <- update(fit) # make new reference value take effect plot(fit, age=NA, ref.zero=TRUE, fun=exp, ylab='Age=x:Age=30 Odds Ratio') abline(h=1, lty=2, col=2); abline(v=30, lty=2, col=2) # Make two curves, and plot the predicted curves as two trellis panels w <- plot(fit, age=NA, sex=NA) # Would be nice if a pl=FALSE option was avail. z <- data.frame(w$x.xbeta) # Makes variable names legal if(.R.) library(lattice) xyplot(log.odds ~ age | sex, data=z, type='l') # To add confidence bands we need to use the Hmisc xYplot function in # place of xyplot xYplot(Cbind(log.odds,lower,upper) ~ age | sex, data=z, method='bands', type='l') # If non-displayed variables were in the model, add a subtitle to show # their settings using title(sub=paste('Adjusted to',w$adjust),adj=0) # See predict.Design for an example using predict and xYplot without plot() # Plots for a parametric survival model n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) t <- -log(runif(n))/h label(t) <- 'Follow-up Time' e <- ifelse(t<=cens,1,0) t <- pmin(t, cens) units(t) <- "Year" ddist <- datadist(age, sex) Srv <- Surv(t,e) # Fit log-normal survival model and plot median survival time vs. age f <- psm(Surv(t, e) ~ rcs(age), dist=if(.R.)'lognormal' else 'gaussian') med <- Quantile(f) # Creates function to compute quantiles # (median by default) plot(f, age=NA, fun=function(x)med(lp=x), ylab="Median Survival Time") # Note: This works because med() expects the linear predictor (X*beta) # as an argument. Would not work if use # plot(..., ref.zero=TRUE or adj.zero=TRUE) # Also, confidence intervals from this method are approximate since # they don't take into account estimation of scale parameter # Fit an ols model to log(y) and plot the relationship between x1 # and the predicted mean(y) on the original scale without assuming # normality of residuals; use the smearing estimator set.seed(1) x1 <- runif(300) x2 <- runif(300) ddist <- datadist(x1,x2) y <- exp(x1+x2-1+rnorm(300)) f <- ols(log(y) ~ pol(x1,2)+x2) r <- resid(f) smean <- function(yhat)smearingEst(yhat, exp, res, statistic='mean') formals(smean) <- list(yhat=numeric(0), res=r[!is.na(r)]) #smean$res <- r[!is.na(r)] # define default res argument to function plot(f, x1=NA, fun=smean, ylab='Predicted Mean on y-scale') options(datadist=NULL) ## Not run: # Example in which separate curves are shown for 4 income values # For each curve the estimated percentage of voters voting for # the democratic party is plotted against the percent of voters # who graduated from college. scat1d is used to indicate # the income-interval-specific data density for college. For # this purpose show the distribution of percent in college for # those having an income level within +/- the half-width of # the income interval. scat1d shows the rug plot superimposed # on the estimated curve. Data are county-level percents. # This can't be done automatically using datadensity on the object # returned by plot.Design, as the variable representing different # curves (income) is a continuous variable. incomes <- seq(22900, 32800, length=4) # equally spaced to outer quintiles pl <- plot(f, college=NA, income=incomes, conf.int=FALSE, xlim=c(0,35), ylim=c(30,55), lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2)) graph.points <- pl$x.xbeta for(i in 1:4) { college.in.income.group <- college[abs(income-incomes[i]) < 1650] this.income <- graph.points[,'income']==incomes[i] scat1d(college.in.income.group, curve=list(x=graph.points[this.income,'college'], y=graph.points[this.income,'democrat'])) } # Instead of showing a rug plot on each curve, erase end portions # of each curve where there are fewer than 10 counties having # % college graduates to the left of the x-coordinate being plotted, # for the subset of counties having median family income with 1650 # of the target income for the curve show.pts <- function(college.pts, income.pt) { s <- abs(income - income.pt) < 1650 #assumes income known to top frame x <- college[s] x <- sort(x[!is.na(x)]) n <- length(x) low <- x[10]; high <- x[n-9] college.pts >= low & college.pts <= high } plot(f, college=NA, income=incomes, conf.int=FALSE, xlim=c(0,35), ylim=c(30,55), lty=1, lwd=c(.25,1.5,3.5,6), col=c(1,1,2,2), perim=show.pts) ## End(Not run)