Create a Multiblock or a Multilayer Correlation Design

DESCRIPTION:

Creates a correlation design object from a fixed, nested, balanced block or unbalanced block design.

USAGE:

corDesign(design.option, n.layer=1, correlation.matrix=NULL, 
    size, type.layer=rep("exchangeable", n.layer), block, data) 

REQUIRED ARGUMENTS:

design.option
a character string specifying a correlation design option. The choices available are:

"block"
balanced or unbalanced block design;
"nested"
nested design (special case of block design);
"fixed"
fixed correlation matrix;
"indep"
independent correlation matrix (special case of fixed design);

OPTIONAL ARGUMENTS:

n.layer
a non-negative integer specifying the number of correlation structures in nested designs and balanced or unbalanced block designs. The default is 1 indicating either a fixed design as specified in argument correlation.matrix, a 1-layer generic structured correlation design as specified in type.layer or a 1-layer block-wise correlation design as specified in block and type.layer.

An independent correlation matrix has 0-layer correlation parameterization. In layer-wise designs, an independent stucture may be used and counted as a layer of structure in the construction of the block-layer relationship, but it is not a layer of unknown parameterization and the computation will only consider layers with nonzero correlation.
correlation.matrix
the correlation matrix for a fixed design. The argument n.layer is set to 1 for fixed designs, and arguments size, type.layer and block are ignored. A matrix with zero correlation is a special case of fixed designs and n.layer of such designs is set to zero.
The correlation matrix uses its integer "dimnames" attribute for its row and column names. Such integers are the indexes of record names, which are used to match the record id as specified in the cluster argument of a gee call to identify observations within clusters.
size
a positive integer specifying the number of rows of a correlation design matrix. The default for a fixed design is the number of rows in correlation.matrix. In other cases, the default is a 1-level design, i.e. a (2 x 2) correlation matrix for n.layer=1.
type.layer
a vector of character strings or a data frame in which each element or row represents a correlation structure of a layer. The number of rows has to be consistent with the value specified in argument n.layer. The choices for generic correlation structures are:

"AR"
auto regressive correlation with discrete occasions;
"contAR"
auto regressive correlation with continuous occasions;
"exchangeable"
exchangeable correlation;
"independent"
independent correlation;
"nonstationary"
nonstationary correlation;
"stationary"
stationary correlation;
"unstruct"
unstructured correlation.

By default, the data is considered to have been collected from a balanced hierarchical design and all variables used in the design are treated as ordered. In this case type.layer should be a vector of n.layer character strings chosen from one of the generic structures.

In more complicated cases such as unbalanced or multilayer designs, type.layer and block are required arguments. In these cases type.layer should be a data frame with n.layer rows. Each row represents a layer of structured correlation, and the sequence of the rows correspond to the order of the layers. By default, the integer row.names are used for such an order. This numeric order of a layer is this layer's id, layer.id. For a nested design, layer.id indicates the ordering in the hierarchical structures. That is, the first row, layer.id=1, is the first layer and is the highest layer, which is defined, by default, as the layer closest to the diagonal of the design matrix. The last row is the lowest layer in our definition but is the top level within the hierarchical structure. A higher-order layer by layer.id always has the precedence over a lower-order layer, and the first layer has the highest precedence.

For nested designs, type.layer provides sufficient information and the argument block can be suppressed. Specification of a correlation structure for a layer.id requires a correlation type. Correlation structures: stationary, nonstationary, AR and unstruct are coordinate-dependent, and some designs require a covariate or a parameter to complete the specification. For example, the "AR" or "contAR" structures require additional time variables and the "stationary" or "nonstationary" structures require a parameter to specify non-zero correlations. In such cases, type.layer should be a data frame with three variables:

"type"
a vector of character strings chosen from the generic correlation structures listed above;
"x.layer"
a vector of character strings with each string being a name of a factor or variable associated with the layer. This is used to identify the levels or the occasions of observations within the layer.id for nested designs or correlation structures requiring variables or parameters, and is optional for a 1-layer design ( n.layer=1). NA or "NA" is permitted, with each NA indicating that the parameterization of the corresponding layer.id does not require any variable. NAs are replaced by record id's.
Variables in x.layer have to be included either in the data or in the search list.
"par"
a vector of numeric values as required by type. This column is optional, and NA is permitted for each row.

The default is exchangeable correlation for all layer.id's.
block
a data frame to list the blocks in block-wise and layer-wise correlation designs. Each row represents a block in the upper triangle of the correlation (modeling) design matrix. Each block has an id, an associated layer, a beginning cell and an ending cell. For each block, the upper-left corner is the beginning cell and the lower-right corner is the ending cell. So, the coordinates of beginning and ending cells specify the position of a block. As a rule of thumb, a block has precedence over another block if that block's column-coordinate is smaller.

The data frame specifying block information should have the following five columns: layer.id, begin.row, begin.col, end.row, end.col. The layer.id associates a block with a layer. The other four columns specify the locations of blocks in the upper triangle of the correlation design matrix. The row.names of this data frame should identify the individual blocks. Note that by default, integers are used for row.names in a data frame.

The layer.id should correspond to the id or order of the layer as specified in the argument type.layer, so that each block has an associated layer. A layer.id may be associated with more than one block. Blocks of the same layer.id have the same correlation structure as defined in argument type.layer. The ( begin.row, begin.col) are the (row, column)- coordinates of the beginning cell, and the ( end.row, end.col) are the (row, column)- coordinates of the ending cell. The values of these coordinates should lie between one and the argument size.
data
a data frame including indexes and variables for identifying the records of a complete cluster. The data should include all variables specified in type.layer. The number of rows of the data must be the same as the row dimension of the correlation matrix of a complete cluster. The default will generate an index variable record.names, which is the record identification of a complete cluster. This index variable will be used to match the second variable of the cluster argument of a gee or a geeDesign call. Therefore, the records of any incomplete cluster and its correlation matrix can be identified.

VALUE:

an object of class "corDesign" is returned. See corDesign.object for details.

SIDE EFFECTS:

A correlation design only needs to specify parameters for one complete block (a pivot) in each layer. In multiblock and multilayer designs, the first block of each layer is required to be a complete block. When a correlation structure such as "unstruct" requires specification of a variable, the values of the variable have to be consistent across blocks. If a variable is not provided in x.layer (see argument type.layer ), the record id in data is used by default. This default is adequate for an "unstruct" layer with only one block and may not be correct for multiblock and multilayer designs.
An "unstruct" layer with multiple blocks requires an indexing variable to specify the (row,column)-coordinate of each parameter. The (row,column)-coordinates in each block have to be the same as that specified by the indexing variable for a pivot block. Any cluster in an unbalanced data is a realization of all pivot blocks and can be identified by the record id's and other variables in x.layer.

DETAILS:

The output of corDesign is mainly used to specify the correlation argument for the geeDesign function. Such correlation designs are desired when a correct correlation structure is somewhat known or is important in specifying the working correlation in GEE. This is particularly the case for multivariate correlated data and multilevel data. It is also useful to ease the efficiency concern when the number of clusters is small and the cluster size is large and complicated. A nested design in this context means that a layer of correlation is nested in another layer. This is not exactly the same notion as in experimental designs. In our cases, we allow recursively nested structure.

This corDesign is meant for arbritary unbalanced designs, therefore, the arguments are complicated. It is recommended that users try simple nested designs first and always review the summary of corDesign object before calling geeDesign and gee.fit. For more details, use as.list to see all components as described in the help file of corDesign.object .

If n.layer is greater than one, the design has a multilayer structured correlation, of which nested designs and unbalanced block designs are examples. These designs have more than one layer of correlation structures, and each layer has an id, a correlation type, and within-layer block information as specified by arguments type.layer and block . Each layer defines an unique parameterization for correlation with a type of structure chosen from the generic structures listed above. A layer may have one or more blocks, and blocks of the same layer have the same parameterization and correlation type. Different layers have distinct parameterizations, so that blocks of different parameterizations have to be considered as different layers.

Cells with zero correlation in the design matrix may be used to construct the representation of multilayer designs, but they do not constitute a layer of correlation parameterization. The diagonal of the design matrix is also not considered as a layer of correlation.

For designs with more than 1-layer, arguments type.layer and block are necessary. If not provided, a nested ( 2^n.layer) factorial design assuming 2-level for each layer is generated. This design has ( 2^n.layer-1) blocks in the upper triangle of the correlation matrix. See examples.

If argument type.layer has a layer variable, x.layer , it must be in the search path or in the data frame when gee is called. In a gee call with unbalanced data, the record id specified in the argument cluster is used to match the row names of the "dimnames" attribute of the correlation matrix or the design matrix.

REFERENCES:

Chao, E. C. (2003). Structured correlation designs in modeling clustered data. Insightful technical report.

SEE ALSO:

, , .

EXAMPLES:

## A nested 2^3 factorial design 
corDesign(design.option="nested", n.layer=3)
 
## A 1-layer design 
ex1 <- data.frame(layer.id="1", begin.row=1,  
  begin.col=2, end.row=11, end.col=12)
corDesign(design.option="block", size=12, n.layer=1, type.layer="AR", block=ex1)  

## 
ex2 <- data.frame(layer.id = c(1, 2), 
                  begin.row = c(1, 3), begin.col = c(1,3), 
                  end.row = c(2, 5), end.col = c(2, 5))

## An alternative way for large number of layers or blocks
ex2 <- data.frame(c(1,2),t(matrix(c(1,1,2,2,3,3,5,5),4)))
names(ex2) <- c("layer.id","begin.row","begin.col","end.row","end.col")

ex2.design <- corDesign(design.option="block", size=5, n.layer=2,  
  type.layer=data.frame(type=c("AR","exchangeable")), block=ex2) 

summary(ex2.design)

as.list(ex2.design)

Seizure.Subject <- recordDesign(cluster = "Subject", data = Seizure)

ex2.gee <- geeDesign(y ~ Time + group, cluster = cbind(clusterID,recordID), 
  variance = "glm.scale", family =  "poisson", link = "log", 
  correlation = ex2.design, data = Seizure.Subject)

gee.fit(ex2.gee)

## A 1-layer of nonstationary nested in a layer of unstructured correlation
ex1 <- data.frame(t(matrix(c(1,1,2,3,4,1,5,6,7,8,2,1,5,4,8),5)))
names(ex1) <- c("layer.id","begin.row","begin.col","end.row","end.col")

type.1 <- data.frame(type=c("nonstationary","unstruct"),
  x.layer=c("record","xx"),par=c(2,NA))

data.1 <- data.frame(record=rep(1:4,2),xx=1:8)

xx1 <- corDesign(design.option = "block", n.layer = 2, 
  size = 8, type.layer = type.1, block = ex1, data=data.1)

summary(xx1)