coxph(formula, data=sys.parent(), weights=<<see below>>, subset, na.action=na.fail, init, control=coxph.control, method="efron", singular.ok=T, robust=<<see below>>, model=F, x=F, y=T)
~
operator and the terms on the right. The response must be a survival object as returned by the
Surv
function.
formula
,
subset
, and
weights
arguments.
weights
is a vector of integers, the estimated coefficients are equivalent to estimating the model from data with the individual cases replicated as many times as indicated by
weights
.
Multiplying all weights by a positive constant
c
does not change the estimated coefficients or the robust standard errors computed by
coxph
.
However, the standard errors of the coefficients will decrease by a factor of
sqrt(c)
.
By default, no weights are included in the model.
data
should be used in the fit. This can be a logical vector, which is replicated to have length equal to the number of observations, a numeric vector indicating which observation numbers are to be included (or excluded if negative), or a character vector of row names to be included. All observations are included by default.
model.frame
after any
subset
argument has been used. The default is
na.fail
, which returns an error if any missing values are found. An alternative is
na.exclude
, which deletes observations that contain one or more missing values.
coxph.control
for the available control options and their default settings.
"efron"
,
"breslow"
, and
"exact"
.
If there are no tied death times, all the methods are equivalent.
Nearly all Cox regression programs use the Breslow method by default, but S-PLUS uses the Efron approximation.
The Efron method is much more accurate when dealing with tied death times, and is as efficient computationally.
The
exact
method computes the exact partial likelihood, which is equivalent to a conditional logistic model.
If there are a large number of ties, the computational time will be excessive.
x
.
If
singular.ok=TRUE
, the program automatically skips over columns of
x
that are linear combinations of earlier columns.
In this case, the coefficients for such columns are
NA
and the variance matrix contains zeros.
For ancillary calculations such as the linear predictor, the missing coefficients are treated as zeros.
TRUE
, a robust variance estimate is returned. The default is
TRUE
if the model includes a
cluster
operative and
FALSE
otherwise.
TRUE
, the model frame is returned in the component named
model
. By default,
model=FALSE
.
TRUE
, the model matrix is returned in the component named
x
. By default,
x=FALSE
.
TRUE
, the response is returned in the component named
y
. By default,
y=TRUE
.
"coxph"
representing the fit. See
coxph.object
for details.
predict
,
residuals
, and
survfit
routines may need to reconstruct the model matrix created by
coxph
.
Differences in the environment, such as which data frames are attached or the value of
options()$contrasts
, may cause this computation to fail or worse, to be incorrect.
See the Guide to Statistics for details.
The proportional hazards model is usually expressed in terms of a single survival time value for each person, with possible censoring. Andersen and Gill reformulated the same problem as a counting process; as time marches onward we observe the events for a subject, rather like watching a Geiger counter. The data for a subject is presented as multiple rows or observations, each of which applies to an observation interval
(start, stop]
.
There are two special terms that may be used in the model equation.
A
strata
term identifies a stratified Cox model, in which separate baseline hazard functions are fit for each strata.
The
cluster
term is used to compute a robust variance for the model.
The term
cluster(id)
, where
id==unique(id)
, is equivalent to specifying the
robust=T
argument and produces an approximate jackknife estimate of the variance.
If the values in
id
are not unique, but instead identify clusters of correlated observations, then the variance estimate is based on a grouped jackknife.
In certain cases, the actual maximum likelihood estimate of a coefficient is infinity (e.g., a dichotomous variable where one of the groups has no events).
When this happens, the associated coefficient grows at a steady pace and a race condition exists in the fitting routine: either the log likelihood converges, the information matrix becomes effectively singular, an argument to
exp
becomes too large for the computer's hardware, or the maximum number of interactions is exceeded.
The routine attempts to detect when this has happened, but is not always successful.
Andersen, P. and Gill, R. (1982). Cox's regression model for counting processes, a large sample study. Annals of Statistics 10: 1100-1120.
Therneau, T., Grambsch, P., and Fleming. T. (1990). Martingale based residuals for survival models. Biometrika 77: 147-160.
# Create the simplest test data set test1 <- list(time = c(4,3,1,1,2,2,3), status = c(1,1,1,0,1,1,0), x = c(0,2,1,1,1,0,0), sex = c(0,0,0,0,1,1,1)) # Fit a stratified model coxph(Surv(time, status) ~ x + strata(sex), data = test1) # Create a simple data set for a time-dependent model test2 <- list(start = c(1,2,5,2,1,7,3,4,8,8), stop = c(2,3,6,7,8,9,9,9,14,17), event = c(1,1,1,1,1,1,1,0,0,0), x = c(1,0,0,1,0,1,1,1,0,0)) summary(coxph(Surv(start, stop, event) ~ x, data = test2)) # Fit a stratified model, clustered on patients bladder1 <- bladder bladder1$start <- NULL bladder1 <- bladder1[bladder1$enum < 5, ] coxph(Surv(stop, event) ~ (rx + size + number) * strata(enum) + cluster(id), data = bladder1, method = "breslow")