Sample Data Sets For Survival Analysis

SUMMARY:

These data sets are included to illustrate the survival analysis methods in S-PLUS.

ARGUMENTS:

bladder
Study on time to recurrence of bladder cancer from Wei, Lin and Weissfeld (1989). The data frame has multiple rows per patient. The columns:
id

patient ID

rx
treatment group (1 = placebo, 2 = thiopeta)
number
the number of initial tumors
size
size of the largest initial tumor
start
entry into the study or the time of last recurrence
stop
time to event (months)
event
indicator of cancer recurrence (1) or censoring (0)
enum
number of recurrences of bladder cancer



capacitor
A simulated accelerated life testing of capacitors from Meeker and Duke (1982). A data frame with columns:
days

time to failure

event
indicator of failure (1) or censoring (0)
voltage
voltage at which the test was run



heart
The Stanford heart transplant data from Kalbfleisch and Prentice (1980). A data frame with each patient represented by two rows. The first entry for a patient has:
start

= 0

transplant
= 0
stop
time to transplant in days

The second entry for a patient has:
start

time to transplant

transplant
= 1
stop
time to death or censoring

The other columns are:
event

indicator of death (1) or censoring (0)

age
(age of acceptance in days/365.25) - 48
year
(date of acceptance in days since October 1, 1967)/365.25
surgery
prior surgery (1 = yes, 0 = no)
id
patient ID



leukemia
Data from Embury et al. (1977) on trial to evaluate efficacy of maintenance chemotherapy for acute myelogenous leukemia. A data frame with columns:
time

time to remission after chemotherapy (weeks)

status
indicator of remission (1) or censored time (0)
group
treatment group, "maintained" or "nonmaintained"



lung
Lung cancer data from Mayo Clinic (Loprinzi et al. 1994). A data frame with columns:
inst

code for the institution at which the patient was hospitalized

time
survival time
status
indicator of death (2) or censoring (1)
age
patient's age
sex
1 = male, 2 = female
ph.ecog
physician's estimate of the ECOG performance score (0-4)
ph.karno
physician's estimate of the Karnofsky score, a competitor to the ECOG performance score
pat.karno
patient's estimate of his/her Karnofsky score
meal.cal
calories consumed at meals excluding beverages and snacks
wt.loss
weight loss in the last six months



ovarian
Data from Edmunson et al. (1979) on ovarian cancer. A data frame with columns:
futime

number of days in study

fustat
indicator of death (1) or censoring (0)
age
patient age in days/365.25
residual.dz
an indicator of the extent of the residual disease
rx
treatment given
ecog.ps
a measure of performance score or functional status using the Eastern Cooperative Oncology Group's scale



The survival analysis chapter in the S-PLUS documentation describes these data sets further and illustrates survival analysis methods with them.

REFERENCE:

Edmunson, J. H., Fleming, T. R., Decker, D. G., Malkasian, G. D., Jefferies, J. A., Webb, M. J., and Kvols, L. K. (1979). Different chemotherapeutic sensitivities and host factors affecting prognosis in advanced ovarian carcinoma vs. minimal residual disease. Cancer Treatment Reports 63, 241-47.

Embury, S. H., Elias, L., Heller, P. H., Hood, C. E., Greenberg, P. L., and Schrier, S. L. (1977). Remission maintenance therapy in acute myelogenous leukemia. Western Journal of Medicine 126, 267-272.

Kalbfleisch, J.D. and Prentice R.L. (1980). The Statistical Analysis of Failure Time Data. New York: Wiley.

Loprinzi, C. L., Laurie, J. A., Wieand, H. S. Krook, J. E., Novotny, P. J., Kugler, J. W., Bartel, J., Law, M., Bateman, M., Klatt, N. E., Dose, A. M., Etzell, P. S., Nelimark, R. A., Mailliard, J. A., and Moertel, C. G. (1994). Prospective evaluation of prognostic variables from patient-completed questionnaires. Journal of Clinical Oncology 12, 601-607.

Meeker, Jr, W. Q. and Duke, S. D. (1982). User's Manual for CENSOR - A User-Oriented Computer Program for Life Data Analysis. Statistical Laboratory, Iowa State University, Ames, IA 50011.

Wei, L. J., Lin, D. Y. and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. Journal of the American Statistical Association 84, 1065-73.