coxph
function to fit the Cox model and
its extension, the Andersen-Gill model. The latter allows for interval
time-dependent covariables, time-dependent strata, and repeated events.
The
Survival
method for an object created by
cph
returns an S
function for computing estimates of the survival function.
The
Quantile
method for
cph
returns an S function for computing
quantiles of survival time (median, by default).
The
Mean
method returns a function for computing the mean survival
time. This function issues a warning if the last follow-up time is uncensored,
unless a restricted mean is explicitly requested.
cph(formula = formula(data), data=if(.R.) parent.frame() else sys.parent(), weights, subset, na.action=na.delete, method=c("efron","breslow","exact","model.frame","model.matrix"), singular.ok=FALSE, robust=FALSE, model=FALSE, x=FALSE, y=FALSE, se.fit=FALSE, eps=1e-4, init, iter.max=10, tol=1e-9, surv=FALSE, time.inc, type, vartype, conf.type, ...) ## S3 method for class 'cph': Survival(object, ...) # Evaluate result as g(times, lp, stratum=1, type=c("step","polygon")) ## S3 method for class 'cph': Quantile(object, ...) # Evaluate like h(q, lp, stratum=1, type=c("step","polygon")) ## S3 method for class 'cph': Mean(object, method=c("exact","approximate"), type=c("step","polygon"), n=75, tmax, ...) # E.g. m(lp, stratum=1, type=c("step","polygon"), tmax, ...)
Surv
object on the left-hand side.
The
terms
can specify any S model formula with up to third-order interactions. The
strat
function may appear in the terms, as a main effect or an interacting
factor. To stratify on both race and sex, you would include both
terms
strat(race)
and
strat(sex)
. Stratification
factors may interact with non-stratification factors;
not all stratification terms need interact with the same modeled
factors.
cph
with
surv=TRUE
age>50 & sex="male"
or
c(1:100,200:300)
respectively to use the observations satisfying a logical expression or those having
row numbers in the given vector.
na.delete
,
which causes observations with any variable missing to be deleted. The main difference
between
na.delete
and the S-supplied function
na.omit
is that
na.delete
makes a list
of the number of observations that are missing on each variable in the model.
The
na.action
is usally specified by e.g.
options(na.action="na.delete")
.
cph
, specifies a particular fitting method,
"model.frame"
instead to return the model frame
of the predictor and response variables satisfying any subset or missing value
checks, or
"model.matrix"
to return the expanded design matrix.
The default is
"efron"
, to use Efron's likelihood for fitting the
model.
For
Mean.cph
,
method
is
"exact"
to use numerical
integration of the
survival function at any linear predictor value to obtain a mean survival
time. Specify
method="approximate"
to use an approximate method that is
slower when
Mean.cph
is executing but then is essentially instant
thereafter. For the approximate method, the area is computed for
n
points equally spaced between the min and max observed linear predictor
values. This calculation is done separately for each stratum. Then the
n
pairs (X beta, area) are saved in the generated S function, and when
this function is evaluated, the
approx
function is used to evaluate
the mean for any given linear predictor values, using linear interpolation
over the
n
X beta values.
TRUE
, the program will automatically skip over columns of the X matrix
that are linear combinations of earlier columns. In this case the
coefficients for such columns will be NA, and the variance matrix will contain
zeros. For ancillary calculations, such as the linear predictor, the missing
coefficients are treated as zeros. The singularities will prevent many of
the features of the
Design
library from working.
TRUE
a robust variance estimate is returned. Default is
TRUE
if the
model includes a
cluster()
operative,
FALSE
otherwise.
FALSE
(false). Set to
TRUE
to return the model frame as element
model
of the fit object.
FALSE
. Set to
TRUE
to return the expanded design matrix as element
x
(without intercept indicators) of the
returned fit object.
FALSE
. Set to
TRUE
to return the vector of response values (
Surv
object) as element
y
of the fit.
FALSE
. Set to
TRUE
to compute the estimated standard errors of
the estimate of X beta and store them in element
se.fit
of the fit. The predictors are first centered to their means
before computing the standard errors.
init
to MLEs and others to zero and specifying
iter.max=1
.
0
to obtain certain
null-model residuals.
TRUE
to compute underlying survival estimates for each
stratum, and to store these along with standard errors of log Lambda(t),
maxtime
(maximum observed survival or censoring time),
and
surv.summary
in the returned object. Set
surv="summary"
to only compute and store
surv.summary
, not survival estimates
at each unique uncensored failure time. If you specify
x=Y
and
y=TRUE
,
you can obtain predicted survival later, with accurate confidence
intervals for any set of predictor values. The standard error information
stored as a result of
surv=TRUE
are only accurate at the mean of all
predictors. If the model has no covariables, these are of course OK.
The main reason for using
surv
is to greatly speed up the computation
of predicted survival probabilities as a function of the covariables,
when accurate confidence intervals are not needed.
surv.summary
. Survival,
number at risk, and standard error will be stored for
t=0, time.inc, 2 time.inc, ..., maxtime
,
where
maxtime
is the maximum survival time over all strata.
time.inc
is also used in constructing the time axis in the
survplot
function (see below). The default value for
time.inc
is 30 if
units(ftime) = "Day"
or no
units
attribute has been attached to the survival time variable. If
units(ftime)
is a word other than
"Day"
, the default
for
time.inc
is 1 when it is omitted, unless
maxtime<1
, then
maxtime/10
is used as
time.inc
. If
time.inc
is not given and
maxtime/ default time.inc
> 25,
time.inc
is increased.
cph
) applies if
surv
is
TRUE
or
"summary"
.
If
type
is omitted, the method consistent with
method
is used.
See
survfit.coxph
(under
survfit
) or
survfit.cph
for details and for the
definitions of values of
type
For
Survival, Quantile, Mean
set to
"polygon"
to use linear
interpolation instead of the usual step function. For
Mean
, the default
of
step
will yield the sample mean in the case of no censoring and no
covariables, if
type="kaplan-meier"
was specified to
cph
.
For
method="exact"
, the value of
type
is passed to the
generated function, and it can be overridden when that function is
actually invoked. For
method="approximate"
,
Mean.cph
generates the function different ways according to
type
, and this
cannot be changed when the function is actually invoked.
survfit.coxph
survfit.cph
; default bases confidence limits of log -log survival.
coxph.fit
from
cph
. Ignored by
other functions.
"sex=male"
) to use in getting
survival probabilities
method="approximate"
in
Mean.cph
.
Mean.cph
, the default is to compute the overall mean (and produce
a warning message if there is censoring at the end of follow-up).
To compute a restricted mean life length, specify the truncation point as
tmax
.
For
method="exact"
,
tmax
is passed to the generated function and it
may be overridden when that function is invoked. For
method="approximate"
,
tmax
must be specified at the time that
Mean.cph
is run.
If there is any strata by covariable interaction in the model such that
the mean X beta varies greatly over strata,
method="approximate"
may
not yield very accurate estimates of the mean in
Mean.cph
.
For
method="approximate"
if you ask for an estimate of the mean for
a linear predictor value that was outside the range of linear predictors
stored with the fit, the mean for that observation will be
NA
.
Survival
,
Quantile
, or
Mean
, an S function is returned. Otherwise,
in addition to what is listed below, formula/design information and
the components
maxtime, time.inc, units, model, x, y, se.fit
are stored, the last 5
depending on the settings of options by the same names.
The vectors or matrix stored if
y=TRUE
or
x=TRUE
have rows deleted according to
subset
and
to missing data, and have names or row names that come from the
data frame used as input data.
Obs
,
Events
,
Model L.R.
,
d.f.
,
P
,
Score
,
Score P
, and
R2
.
surv="T"
)
surv=TRUE
.
The first dimension is time ranging from 0 to
maxtime
by
time.inc
. The second dimension refers to strata.
The third dimension contains the time-oriented matrix with
Survival, n.risk
(number of subjects at risk),
and
std.err
(standard error of log-log
survival).
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
# Simulate data from a population model in which the log hazard # function is linear in age and there is no age x sex interaction n <- 1000 set.seed(731) age <- 50 + 12*rnorm(n) label(age) <- "Age" sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens <- 15*runif(n) h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) dt <- -log(runif(n))/h label(dt) <- 'Follow-up Time' e <- ifelse(dt <= cens,1,0) dt <- pmin(dt, cens) units(dt) <- "Year" dd <- datadist(age, sex) options(datadist='dd') Srv <- Surv(dt,e) f <- cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE) cox.zph(f, "rank") # tests of PH anova(f) plot(f, age=NA, sex=NA) # plot age effect, 2 curves for 2 sexes survplot(f, sex=NA) # time on x-axis, curves for x2 res <- resid(f, "scaledsch") time <- as.numeric(dimnames(res)[[1]]) z <- loess(res[,4] ~ time, span=0.50) # residuals for sex if(.R.) plot(time, fitted(z)) else plot(z, coverage=0.95, confidence=7, xlab="t", ylab="Scaled Schoenfeld Residual",ylim=c(-3,5)) lines(supsmu(time, res[,4]),lty=2) plot(cox.zph(f,"identity")) #Easier approach for last 6 lines # latex(f) f <- cph(Srv ~ age + strat(sex), surv=TRUE) g <- Survival(f) # g is a function g(seq(.1,1,by=.1), stratum="sex=Male", type="poly") #could use stratum=2 med <- Quantile(f) plot(f, age=NA, fun=function(x) med(lp=x)) #plot median survival # g <- cph(Surv(hospital.charges) ~ age, surv=TRUE) # Cox model very useful for analyzing highly skewed data, censored or not # m <- Mean(g) # m(0) # Predicted mean charge for reference age #Fit a time-dependent covariable representing the instantaneous effect #of an intervening non-fatal event rm(age) set.seed(121) dframe <- data.frame(failure.time=1:10, event=rep(0:1,5), ie.time=c(NA,1.5,2.5,NA,3,4,NA,5,5,5), age=sample(40:80,10,rep=TRUE)) z <- ie.setup(dframe$failure.time, dframe$event, dframe$ie.time) S <- z$S ie.status <- z$ie.status attach(dframe[z$subs,]) # replicates all variables f <- cph(S ~ age + ie.status, x=TRUE, y=TRUE) #Must use x=TRUE,y=TRUE to get survival curves with time-dep. covariables #Get estimated survival curve for a 50-year old who has an intervening #non-fatal event at 5 days new <- data.frame(S=Surv(c(0,5), c(5,999), c(FALSE,FALSE)), age=rep(50,2), ie.status=c(0,1)) g <- survfit(f, new) plot(c(0,g$time), c(1,g$surv[,2]), type='s', xlab='Days', ylab='Survival Prob.') # Not certain about what columns represent in g$surv for survival5 # but appears to be for different ie.status #or: #g <- survest(f, new) #plot(g$time, g$surv, type='s', xlab='Days', ylab='Survival Prob.') #Compare with estimates when there is no intervening event new2 <- data.frame(S=Surv(c(0,5), c(5, 999), c(FALSE,FALSE)), age=rep(50,2), ie.status=c(0,0)) g2 <- survfit(f, new2) lines(c(0,g2$time), c(1,g2$surv[,2]), type='s', lty=2) #or: #g2 <- survest(f, new2) #lines(g2$time, g2$surv, type='s', lty=2) detach("dframe[z$subs, ]") options(datadist=NULL)