plot.Design
,
summary.Design
,
survplot
, and
nomogram.Design
.
If
datadist
is called before
a model fit and the resulting object pointed to with
options(datadist="name")
,
the data characteristics will be stored with the fit by
Design()
, so
that later predictions and summaries of the fit will not need to access
the original data used in the fit. Alternatively, you can specify the
values for each variable in the model when using these 3 functions, or
specify the values of some of them and let the functions look up the
remainder (of say adjustmemt levels) from an object created by
datadist
.
The best method is probably to run
datadist
once before any models are
fitted, storing the distribution summaries for all potential variables.
Adjustment values are
0
for binary variables, the most frequent
category (or optionally the first category level)
for categorical (
factor
) variables, the middle level for
ordered factor
variables, and medians for continuous variables.
See descriptions of
q.display
and
q.effect
for how display and
effect ranges are chosen for continuous variables.
datadist(..., data, q.display, q.effect=c(0.25, 0.75), adjto.cat=c('mode','first'), n.unique=10) ## S3 method for class 'datadist': print(x, ...) # options(datadist="dd") # used by summary, plot, survplot, sometimes predict # For dd substitute the name of the result of datadist
Design
information. The first element in this list may
also be an object created by an earlier call to
datadist
; then
the later variables are added to this
datadist
object.
For a fit object, the variables named
in the fit are retrieved from the active data frame or from the location
pointed to by
data=frame number
or
data="data frame name"
.
For
print
, is ignored.
data
is a search position,
it is assumed that a data frame is attached in that position, and all
its variables are used. If you specify both individual variables in
...
and
data
, the two sets of variables are combined. Unless the
first argument is a fit object,
data
must be an integer.
q.display
,
those quantiles are used whether or not n<200.
"mode"
, indicating that the modal (most frequent) category
for categorical (factor) variables is the adjust-to setting.
Specify
"first"
to use the first level of factor variables as the
adjustment values. In the case of many levels having the maximum
frequency, the first such level is used for
"mode"
.
n.unique
or fewer unique values are considered
to be discrete variables in that their unique values are stored in the
values
list. This will affect how functions such as
nomogram.Design
determine whether variables are discrete or not.
datadist
For categorical variables, the 7 limits are set to character strings
(factors) which correspond to
c(NA,adjto.level,NA,1,k,1,k)
, where
k
is the number of levels.
For ordered variables with numeric levels, the limits are set to
c(L,M,H,L,H,L,H)
, where
L
is the lowest level,
M
is the middle
level, and
H
is the highest level.
"datadist"
with the following components
n.unique
unique values
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
## Not run: d <- datadist(data=1) # use all variables in search pos. 1 d <- datadist(x1, x2, x3) page(d) # if your options(pager) leaves up a pop-up # window, this is a useful guide in analyses d <- datadist(data=2) # all variables in search pos. 2 d <- datadist(data=my.data.frame) d <- datadist(my.data.frame) # same as previous. Run for all potential vars. d <- datadist(x2, x3, data=my.data.frame) # combine variables d <- datadist(x2, x3, q.effect=c(.1,.9), q.display=c(0,1)) # uses inter-decile range odds ratios, # total range of variables for regression function plots d <- datadist(d, z) # add a new variable to an existing datadist options(datadist="d") #often a good idea, to store info with fit f <- ols(y ~ x1*x2*x3) options(datadist=NULL) #default at start of session f <- ols(y ~ x1*x2) d <- datadist(f) #info not stored in `f' d$limits["Adjust to","x1"] <- .5 #reset adjustment level to .5 options(datadist="d") f <- lrm(y ~ x1*x2, data=mydata) d <- datadist(f, data=mydata) options(datadist="d") f <- lrm(y ~ x1*x2) #datadist not used - specify all values for summary(f, x1=c(200,500,800), x2=c(1,3,5)) # obtaining predictions plot(f, x1=200:800, x2=3) # Change reference value to get a relative odds plot for a logistic model d$limits$age[2] <- 30 # make 30 the reference value for age # Could also do: d$limits["Adjust to","age"] <- 30 fit <- update(fit) # make new reference value take effect plot(fit, age=NA, ref.zero=TRUE, fun=exp, ylab='Age=x:Age=30 Odds Ratio') ## End(Not run)