pyears(formula, data, weights, subset, na.action, ratetable=survexp.us, scale=365.25, expect="event", model=F, x=F, y=F)
Surv
object containing the follow-up time
and an event indicator.
The predictors consist of optional grouping variables separated by + operators
(exactly as in
survfit
),
time-dependent grouping variables such as age
(specified with
tcut
),
and optionally a
ratetable
term.
This latter matches each subject to his/her expected cohort.
formula
,
or in the
subset
and the
weights
argument.
subset
argument has been used.
Default is
options()$na.action
.
survexp.uswhite
.
"event"
then the output table includes
the expected number of events.
If
"pyears"
then the output table
includes the expected number of person-years of observation.
This is only valid with a rate table.
TRUE
then the model frame
is included as component
model
in the object returned by the function.
TRUE
then the model matrix
is returned as component
x
in the object returned by the function.
TRUE
then the response
is returned as component
y
in the object returned by the function.
pyears
array.
Surv
object.
expect="pyears"
).
This will be present only if there was a
ratetable
term.
pyears
array.
This is often useful as an error check;
if there is a mismatch of units between two variables,
nearly all the person years may be off table.
na.action
routine, if any.
Because
pyears
may have several time variables,
it is necessary that all of them be in the same units.
For instance, in the call
py <- pyears(futime ~ rx + ratetable(age=age, sex=sex, year=entry.dt))
with a ratetable whose natural unit is days,
it is important that
futime
,
age
and
entry.dt
all be in days.
Given the wide range of possible inputs,
it is difficult for the routine to do sanity checks of this aspect.
A special function
tcut
is needed to specify
time-dependent cutpoints.
For instance, assume that age is in years, and that the desired final
arrays have as one of their margins the age groups 0-2, 2-10, 10-25, and 25+.
A subject who enters the study at age 4 and remains under observation for
10 years will contribute follow-up time to both the 2-10 and 10-25
subsets.
If
cut(age, c(0,2,10,25,100))
were used
in the formula,
the subject would be classified according to his starting age only.
The
tcut
function has the same arguments as
cut
,
but produces a different output object which allows the
pyears
function to correctly track the subject.
The results of
pyears
are normally used
as input to further calculations.
The
print
routine, therefore,
is designed to give only a summary of the table.
The example below is from a study of hip fracture rates from 1930 - 1990
in Olmstead County, Minnesota.
Survival post hip fracture has increased over that time,
but so has the survival of elderly subjects in the population at large.
A model of relative survival helps to clarify what has happened:
Poisson regression is used, but replacing exposure time with expected
exposure (for an age and sex matched control).
Death rates change with age, of course, so the result is carved into
1 year increments of time.
Males and females were done separately.
# Example #1: attach(makehips) temp1 <- tcut(dt.fracture, seq(from=julian(1,1,30), by=365.25, length=61)) temp2 <- tcut(age*365.5, 365.25*(0:105)) # max age was > 100! pfit.m <- pyears(Surv(futime, status) ~ temp1 + temp2 + ratetable(age=age*365.25, year=dt.fracture, sex=1), subset=(sex==1), ratetable=survexp.minnwhite) # Now, convert the arrays into a data frame: tdata <- data.frame(age =(0:105)[col(pfit.m$pyears)], yr=(1930:1990)[row(pfit.m$pyears)], y=c(pfit.m$event), time=c(pfit.m$expect)) # Fit the gam model: gfit.m <- gam(y ~ s(age) + s(yr) + offset(log(time)), family=poisson, data=tdata) plot(gfit.m, se=T) # Example #2 # Create the hearta data frame: hearta <- by(heart, heart$id, function(x) x[x$stop == man(x$stop),]) hearta <- do.call("rbind", hearta) # Produce pyears table: pyears(stop/365.25 ~ tcut(age + 48, c(0,50,60,70,100)) + surgery, data=hearta, scale=1)