survfit(formula, data=sys.parent(), weights, subset, na.action, newdata, individual=F, conf.int=.95, se.fit=T, type=<<see below>>, error=<<see below>>, conf.type="log", conf.lower="usual", start.time)
coxph
object.
If a formula object is supplied it must have a
Surv
object as the
response on the left of the ~ operator
and, if desired, terms separated by + operators on the right.
One of the terms may be a
strata
object.
For a single survival curve the "~ 1" part of the formula is not required.
subset
and the
weights
argument.
subset
argument.
subset
argument has been used.
Default is
options()$na.action
.
coxph
formula. Only applicable when
formula
is a
coxph
object.
The curve(s) produced will be representative of a cohort who's
covariates correspond to the values in
newdata
.
Default is the mean of the covariates used in the
coxph
fit.
TRUE
, the data frame represents different
time epochs for only one individual. If
FALSE
, multiple rows indicate
multiple individuals.
If
TRUE
, only one curve will be produced,
if
FALSE
, there will be one curve per row in
newdata
.
Only applicable when
formula
is a
coxph
object.
TRUE
.
"kaplan-meier"
,
"fleming-harrington"
or
"fh2"
if a formula is given
and
"aalen"
or
"kaplan-meier"
if the first argument is a
coxph
object,
(only the first two characters are necessary).
The default is
"aalen"
when a
coxph
object is given,
and it is
"kaplan-meier"
otherwise.
Earlier versions of
survfit
used
type="tsiatis"
to get the
"aalen"
estimator.
For backward compatibility, this is still allowed.
"greenwood"
for the Greenwood formula
or
"tsiatis"
for the Tsiatis formula,
(only the first character is necessary).
The default is
"tsiatis"
when a
coxph
object is given,
and it is
"greenwood"
otherwise.
"none"
for no confidence intervals,
"plain"
for standard intervals, "curve +- k *se(curve)",
where k is determined from
conf.int
,
"log"
for intervals based on the cumulative hazard or log(survival)
(the default),
and
"log-log"
for intervals based on the log hazard or log(-log(survival)).
The last type will never extend past 0 or 1.
Only enough of the string to uniquely identify it is necessary.
"usual"
,
"peto"
, and
"modified"
.
The modified lower limit is based on an "effective n" argument.
The confidence bands will agree with the usual calculation at each death time,
but unlike the usual bands the confidence interval becomes wider
at each censored observation.
The extra width is obtained by multiplying the usual variance
by a factor m/n, where n is the number currently at risk
and m is the number at risk at the last death time.
(The bands thus agree with the un-modified bands at each death time.)
This is especially useful for survival curves with a long flat tail.
"survfit"
.
See
survfit.object
for details.
Methods defined for survfit objects are
print
,
plot
,
lines
, and
points
.
The estimates used are the Kalbfleisch-Prentice
(Kalbfleisch and Prentice, 1980, p.86) and the Tsiatis/Link/Breslow,
which reduce to the Kaplan-Meier and Fleming-Harrington estimates,
respectively, when the weights are unity.
When curves are fit for a Cox model,
subject weights of
exp(sum(coef*(x-center)))
are used,
ignoring any value for
weights
input by the user.
There is also an extra term in the variance of the curve,
due to the variance of the coefficients
and hence variance in the computed weights.
Details of the Aalen estimator and its variance are found in Tsiatis (1981).
The Greenwood formula for the variance is a sum of terms
d/(n*(n-m)), where d is the number of deaths at a given time point, n
is the sum of weights for all individuals still at risk at that time, and
m is the sum of weights for the deaths at that time. The
justification is based on a binomial argument when weights are all
equal to one; extension to the weighted case is ad hoc. Tsiatis
(1981) proposes a sum of terms d/(n*n), based on a counting process
argument which includes the weighted case.
For the Fleming-Harrington estimate, two different methods for
handling ties have been implemented.
If there were 3 deaths out of 10 at risk, then the original estimate
of Nelson increments the hazard by 3/10, while the modification of
Fleming and Harrington increments it by 1/10 + 1/9 + 1/8.
For curves created after a Cox model these correspond
to the Breslow and Efron estimates,
respectively, and the proper choice is made automatically.
The
fh2
method will
give results closer to the Kaplan-Meier.
When the data set includes left censored or interval censored data (or both),
then the EM approach of Turnbull is used to compute the overall curve.
When the baseline method is the Kaplan-Meier, this is known to converge to
the maximum likelihood estimate.
Based on the work of Link (1984), the log transform is expected to produce
the most accurate confidence intervals. If there is heavy censoring, then
based on the work of Dorey and Korn (1987) the modified estimate will give
a more reliable confidence band for the tails of the curve.
Dorey, F. J. and Korn, E. L. (1987).
Effective sample sizes for confidence intervals for survival probabilities.
Statistics in Medicine
6, 679-87.
Fleming, T. H. and Harrington, D. P. (1984).
Nonparametric estimation of the survival distribution in censored data.
Comm. in Statistics
13, 2469-86.
Kalbfleisch, J. D. and Prentice, R. L. (1980).
The Statistical Analysis of Failure Time Data.
New York:Wiley.
Link, C. L. (1984).
Confidence intervals for the survival function using Cox's
proportional hazards model with covariates.
Biometrics
40, 601-610.
Tsiatis, A. (1981).
A large sample study of the estimate for the integrated hazard function
in Cox's regression model for survival data.
Annals of Statistics
9, 93-108.
Turnbull, B. W. (1974).
Nonparametric estimation of a survivorship function with doubly censored data.
Journal American Statistical Association
69, 169-173.
# Fit a Kaplan-Meier and plot it fit <- survfit(Surv(time, status) ~ group, data=leukemia) plot(fit, lty=2:3) legend(100, .8, c("Maintained", "Nonmaintained"), lty=2:3) # Fit a Cox proportional hazards model and plot the # predicted survival curve at the average predictor fit <- coxph(Surv(futime, fustat) ~ age, data=ovarian) plot(survfit(fit), xlab="Survival in Days") # Here is the data set from Turnbull # There are no interval censored subjects, only left-censored (status=3), # right-censored (status 0) and observed events (status 1) tdata <- data.frame(time=c(1,1,1,2,2,2,3,3,3,4,4,4), status=rep(c(1,0,2), 4), n=c(12,3,2,6,2,4,2,0,2,3,3,5)) fit <- survfit(Surv(time, time, status, type='interval') ~ 1, data=tdata, weight=n)