"gee"
that represents a
fit of a Generalized Estimation Equation model.
gee(formula, cluster, variance, data=sys.parent(), family=gaussian, link=NULL, correlation="independent", start=NULL, contrasts=NULL, subset, na.action=na.omit, control=list(algorithm=2))
glm
function.
cbind(cluster.id, record.id)
to specify, in which
the variables have to be in the search path or in the data frame entered in
data
.
The first column is the cluster id and the second column is the record id.
Each row of the matrix corresponds to the identification of an observation in a cluster.
Observations within the same cluster might be correlated, while observations from
different clusters are uncorrelated.
All of the observations within the same cluster are assumed to have the same variance.
The record id is important when the data are
unbalanced or the correlation is coordinate-dependent, i.e. discrete time AR, stationary,
nonstationary and unstructured correlation.
In these cases, the ordering
of observations within a cluster has an impact on results.
For unbalanced data, all unique record
id's constitute a complete cluster, and each cluster is a subset of the complete cluster.
In modeling repeated measures with coordinate-independent correlation structures such
as independent, exchangeable and continuous AR,
the record id could be arbitrary.
Only in a balanced data, the default will generate
a vector of integers from 1 to the size of a cluster
for
record.id
. For other cases,
provide two variables for this argument.
The function
recordDesign
is useful for creating record id's
for some data.
"glm.scale"
and
"glm.1"
.
Enter
"glm.scale"
to indicate that the variance
follows the structure of a generalized linear model with a multiplicative of a scale parameter.
If the initial value of the scale parameter is known, then enter that value.
If the scale parameter is known to be exactly equal to 1, enter
"glm.1"
.
Use
varDesign
for more complicated variance
structures.
formula
,
subset
,
cluster
,
variance
, and
correlation
. If
data
is missing,
then the variables should be on the search list.
glm
"family"
object or
character string identifying the family. Families supported are
"gaussian"
,
"binomial"
,
"poisson"
,
"Gamma"
,
"inverse.gaussian"
. The default is
"gaussian"
.
family
.
For example, use
"log"
for poisson, and
"logit"
or
"probit"
for binomial. Not all links are supported for each family. The supported links are
"logit"
,
"probit"
,
"cloglog"
,
"log"
,
"identity"
,
"power(x)"
,
"inverse"
, and
"1/mu^2"
.
The
"gaussian"
and
"inverse.gaussian"
families have only one supported link,
"identity"
and
"1/mu^2"
, respectively. The
"power(x)"
link is parameterized by a non-negative real value
x
and may
be used with all families.
geeDesign
for more complicated correlation
structures.
Typically, a character string
is entered to specify a correlation structure.
The following character strings are permitted:
auto regressive correlation with discrete occasions,
a
for
correlation
r
under
"AR"
is
r=a^d
and
under
"contAR"
is
r=exp(-d/a)
,
where
d
is the difference between two time points.
For other structures, each cell is either
r=a
or
r=0
,
e.g. the cells off the non-zero bands in
stationary.
The "stationary"
and
"nonstationary"
require a numeric parameter to specify the number of bands, and the default is 1.
Except
"independent"
and
"exchangeable"
, all other structures are either
coordinate or covariate dependent and thus require additional variables
to identify the time variable, locate missed experiments in unbalanced data
or resolve the ordering of
occasions in balanced data.
By default, the integer
record.id
in
cluster
is used for indexing the discrete time
"AR"
,
"nonstationary"
,
"stationary"
and
"unstruct"
cases.
If the data are not balanced and sorted according to cluster and
record id, and if the structure is one of
"AR"
,
"contAR"
,
"nonstationary"
,
"stationary"
or
"unstruct"
structures, enter a list for the
correlation
argument
with the following component names:
"type"
: one of the above correlation structures.
"x.layer"
: a name of the factor or variable to identify the
levels or coordinate of observations within clusters.
If
"x.layer"
is not provided,
the
record.id
in
cluster
will be used whenever the specified type requires a variable or an index.
This default might not be suitable for certain correlation structures.
"par"
: the parameter value, such as the number of bands
required by the
"stationary"
or
"nonstationary"
correlation structures.
correlation
argument:
correlation = "AR"
correlation = list(type = "contAR", x.layer = "time")
correlation = list(type="stationary", par=2)
"na.omit"
.
geeControl
object to control the
iteration procedures. These include
algorithm
,
tolerance.reg
,
tolerance.cor
,
maxit
,
trace
, and
sorted
.
geeControl
. See the help file for
geeControl
function for details.
"gee"
is returned.
See
gee.object
for details.
A complete specification of a GEE model includes
the mean, the variance, and a working correlation matrix.
A simple form of GEE models uses the mean and the variance structures
of a Generalized Linear Model,
and these can be specified by arguments
family
,
link
and
variance
.
These are similar to the
family
and
link
arguments in
the
glm
function but not exactly the same.
The link function of a family
associates the mean and the linear predictor, which indicates
the regression parameters of interest.
The linear predictors
can be specified in the required argument
formula
.
The argument
cluster
is
to identify independent clusters with cluster id and record id.
All of the observations within the same cluster are assumed to have the same variance.
For some simple cases, the function
recordDesign
can sort the data and generate these variables.
The default method to estimate the initial values of the regression coefficients is
glm
assuming independent clusters.
If the initial values are known, enter them to the
argument
start
.
For each family, the variance is assumed
to be a known function of the mean with a multiplication of a scale parameter.
If the scale is exactly 1, set the argument
variance
to be
"glm.1"
.
If the scale is a known constant, enter a positive number to
variance
.
Otherwise, the scale is an unknown parameter, so enter
"glm.scale"
.
The correlation matrix is parameterized by a vector.
The estimates of regression coefficients and correlation parameters are obtained by
a pair of estimating equations (Prentice, 1988).
Fisher scoring is implemented in the iteration processes.
Therefore, estimates of the regression coefficients and correlation vector and their
variance estimates are available.
This algorithm is called GEE2. In Liang and Zeger (1986), the
correlation vector is estimated by the method of moments, and this algorithm is called
GEE1. The covariance can be fixed to constant during iterations and the
nuisance parameters can be estimated by the methods of moments after the
regression parameters achieve convergence. This algorithm is called GEE0.
The option can be set in
control=list(algorithm=2)
.
The
control
argument can set other parameters such
as convergence criteria. Besides, if the data had been sorted, enter
control=list(sorted=T)
to save for unnecessary sorting.
The argument
correlation
specifies
the working correlation matrix.
In general,
stationary
and
nonstationary
require a variable and a parameter to identify the corresponding parameterization.
In this case, enter
correlation
a list with arguments
type, x.layer, par
, in which,
type
is an option of correlation structures;
x.layer
is the name of a variable to identify records;
and
par
is an integer indicating the
band of the stationary or the nonstationary structure.
Note the value of the variable
x.layer
must be a column in the
data frame
data
or
in the search path. Only in case of balanced data and data being sorted according to
cluster id and record id, the default
record id can be used for
x.layer
.
The default uses the integers from
1 to the size of an individual cluster, and this record id
is the same for all clusters.
For unbalanced data, such a
x.layer
serving to identify records
in a cluster is required.
For discrete
AR
, the time variable is usually served
for this
x.layer
, which might not be the record id.
On the other hand, a continuous time variable is not necessary
the same as a discrete time or a record id and should be avoided for such purposes
in discrete or continuous AR.
For example, to apply unstructured correlation to unbalanced longitudinal data,
a continuous time variable can not replace the role of the record id
for the identification of parameters in the unstructured correlation of different
clusters. So, a correct
x.layer
is required.
To fit a more complicated model, see
geeDesign
, which
provide advanced methods for variance structures and correlation structures in
modeling overdispersed and hierarchical data.
Liang, K.Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13-22.
Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observation, Biometrics 44 1033-1048.
## Create clusterID and recordID and sort the data; add an offset ## based on the no. of weeks of observation, baseline=8, treatment=2 Seizure.Subject <- recordDesign("Subject",data.frame(Seizure, offset=rep(log(c(8,2,2,2,2)),59))) gee.out <- gee(y~group+offset(offset),cluster=cbind(clusterID,recordID), variance="glm.scale",data=Seizure.Subject,family=poisson,link=log, correlation="exchangeable",contrasts=list(group=contr.treatment), control=geeControl(trace = T)) ## Add baseline indicator to isolate a baseline effect Seizure1.Subject <- data.frame(Seizure.Subject,post=rep(c(0,1,1,1,1),59)) gee.out <- gee(y~group*post+offset(offset),cluster=cbind(clusterID,recordID), variance="glm.scale",family="poisson",link="log",data=Seizure1.Subject, correlation=list(type="stationary",par=4),subset=Subject!=49, contrasts=list(group=contr.treatment)) summary(gee.out) ## For a known scale in variance structure SpruceGrpd.Subject <- recordDesign("Subject",na.omit(SpruceGrpd)) gee.out <- gee(y~Time + group, cluster=cbind(clusterID,recordID), variance=0.02, family=Gamma,link="power(1.5)",data=SpruceGrpd.Subject, correlation=list(type="contAR", x.layer="Time"), contrasts=list(group=contr.treatment), control=list(algorithm=2))