cohort variable
is used to define the current qualifying condition for a cohort of
subjects, e.g., y>=q 2.
cr.setup creates the needed auxilliary variables.
See
predab.resample and
validate.lrm for information about validating
CR models (e.g., using the bootstrap to sample with replacement from the
original subjects instead of the records used in the fit, validating
the model separately for user-specified values of
cohort).
cr.setup(y)
category, or
factor vector containing values of
the response variable. For
category or
factor variables, the
levels of the variable are assumed to be listed in an ordinal way.
y, cohort, subs, reps.
y is a new binary
variable that is to be used in the binary logistic fit.
cohort is
a
factor vector specifying which cohort condition currently applies.
subs
is a vector of subscripts that can be used to replicate other
variables the same way
y was replicated.
reps specifies how many
times each original observation was replicated.
y, cohort, subs are
all the same length and are longer than the original
y vector.
reps
is the same length as the original
y vector.
The
subs vector is suitable for passing to
validate.lrm or
calibrate,
which pass this vector under the name
cluster on to
predab.resample so that bootstrapping can be
done by sampling with replacement from the original subjects rather than
from the individual records created by
cr.setup.
Frank Harrell
Department of Biostatistics
Vanderbilt University
f.harrell@vanderbilt.edu
Berridge DM, Whitehead J: Analysis of failure time data with ordinal categories of response. Stat in Med 10:1703–1710, 1991.
y <- c(NA, 10, 21, 32, 32)
cr.setup(y)
set.seed(171)
y <- sample(0:2, 100, rep=TRUE)
sex <- sample(c("f","m"),100,rep=TRUE)
sex <- factor(sex)
table(sex, y)
options(digits=5)
tapply(y==0, sex, mean)
tapply(y==1, sex, mean)
tapply(y==2, sex, mean)
cohort <- y>=1
tapply(y[cohort]==1, sex[cohort], mean)
u <- cr.setup(y)
Y <- u$y
cohort <- u$cohort
sex <- sex[u$subs]
lrm(Y ~ cohort + sex)
f <- lrm(Y ~ cohort*sex) # saturated model - has to fit all data cells
f
# In S-PLUS:
#Prob(y=0|female):
# plogis(-.50078)
#Prob(y=0|male):
# plogis(-.50078+.11301)
#Prob(y=1|y>=1, female):
plogis(-.50078+.31845)
#Prob(y=1|y>=1, male):
plogis(-.50078+.31845+.11301-.07379)
combinations <- expand.grid(cohort=levels(cohort), sex=levels(sex))
combinations
p <- predict(f, combinations, type="fitted")
p
p0 <- p[c(1,3)]
p1 <- p[c(2,4)]
p1.unconditional <- (1 - p0) *p1
p1.unconditional
p2.unconditional <- 1 - p0 - p1.unconditional
p2.unconditional
## Not run:
dd <- datadist(inputdata) # do this on non-replicated data
options(datadist='dd')
pain.severity <- inputdata$pain.severity
u <- cr.setup(pain.severity)
# inputdata frame has age, sex with pain.severity
attach(inputdata[u$subs,]) # replicate age, sex
# If age, sex already available, could do age <- age[u$subs] etc., or
# age <- rep(age, u$reps), etc.
y <- u$y
cohort <- u$cohort
dd <- datadist(dd, cohort) # add to dd
f <- lrm(y ~ cohort + age*sex) # ordinary cont. ratio model
g <- lrm(y ~ cohort*sex + age, x=TRUE,y=TRUE) # allow unequal slopes for
# sex across cutoffs
cal <- calibrate(g, cluster=u$subs, subset=cohort=='all')
# subs makes bootstrap sample the correct units, subset causes
# Predicted Prob(pain.severity=0) to be checked for calibration
## End(Not run)