Conditional Gaussian Model for Complete Data

DESCRIPTION:

Compute conditional Gaussian model parameter estimates for data with no missing values.

USAGE:

completeCgm(data, margins, gauss, design, optData, subset, prior = 1, 
    start = NULL, control = emCgm.control(), contrasts = NULL) 

REQUIRED ARGUMENTS:

data
a data frame or matrix containing the raw data.

When a data frame is input and if the margins argument is not provided, then the loglinear part of the model is assumed to be a saturated model in which all factor variables are used to form the table. If the gauss argument is not provided, then all numeric variables in the data frame are included in the conditional Gaussian part of the model.

When a matrix is input, you must provide the margins argument, which provides the names of the variables to use in the discrete part of the model. If the gauss argument is omitted, then all remaining variables in the matrix are used in the Gaussian part of the model.

OPTIONAL ARGUMENTS:

margins
the marginal totals to be fit in the log-linear model. A margin is described by the factors not summed over. Thus list(1:2, 3:4) would indicate fitting the 1,2 margin (summing over variables 3 and 4) and the 3,4 margin in a four-way table. This same model can be specified using the names of the variables (e.g., list(c("V1", "V2"), c("V3", "V4"))), or using formula notation, as in margins = ~V1:V2 + V3:V4.

If margins is not specified, a saturated model is fitted.

When a matrix is input as argument data, argument margins must be specified. When a data frame is input and argument margins is missing, then a saturated model involving all factor variables is fitted.

If a class "missmodel" object is input, then if margins is not given, argument margins defaults to the margins specified in the call statement of the input "missmodel" object.
gauss
identifies the variables to be used in the conditional Gaussian part of the model. These variables may be specified in three ways: as a vector of variable indices, e.g., c(1, 2, 4); as a vector of variable names, e.g. c("V1", "V2", "V4"); and using formula notation, e.g. ~V1+V2+V4. If argument gauss is omitted, then all numeric variables (which do not appear in argument margins) are used in the multivariate gaussian model.
design
a formula giving the regression model for predicting the numeric variable cell means as a linear function of the factor variables and the variables provided in optData. Optionally, an ncell by m matrix may be input directly as the design matrix.

Let i=1, ..., ncell denote the cells in the loglinear model, and let mu(i) denote the vector of numeric variable means in cell i. Then the formula design provides the design matrix for predicting the cell means. As an example, let "V1" and "V2" be the names of the factor variables, and let "age" be a vector giving an average age for the subjects in each cell. Then formula design=~V1+V2 indicates a main effect model for the cell means, while design=~V1 + V2 + age indicates a main effect model for the cell means, adjusted for average cell age.

Optionally, an ncell by m matrix may be input. In this case, the regression model is obtained as a linear function of the columns of the input matrix.

If design is not specified, then the design matrix is taken to be an identity matrix.
optData
a data frame with ncell rows containing predictors to be used in computing the design matrix. In the example given in the description for argument design, the variable age would be input in argument optData.
subset
expression specifying which rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of rows), a numeric vector indicating the observation numbers to be included, or a character vector of the row names to be included. All observations are included by default. If data is a data frame, this expression may use variables in the data frame.
prior
Gives the hyperparameters of the Dirichlet prior distribution assumed for the loglinear part of the model. Note that a noninformative prior is always assumed for the Gaussian parameters.

Supply either a character string, or an object of class "priorLoglin", or a vector of hyperparameters.

Valid character strings are "ml" (maximum likelihood), "noninformative", and "data.dependent". String matching is used, so the characters "m", "n", or "d" are sufficient. The values of the hyperparameters change with the algorithm (see for details). E.g. "noninformative" means a common value of 1 for EM, and a common value of 0.5 for DA.

A class "priorLoglin" object is created by routine priorLoglin.

If a vector of hyperparameters is supplied, the length of the vector equals the number of cells formed by the factor variables. The vector is ordered so that the levels of the first variable vary fastest, the second variable levels vary next fastest, etc. If a single numeric value is input, its value is replicated for all cells in the table. The hyperparameters for a data dependent prior (following an independence model) can be generated using routine dataDepPrior. See for details.

The default value is "noninformative". When a class "missmodel" object is input, any value specified in a previous call has priority over the default value (but not over any currently specified value).

Structural zeros must be coded as missing ( NA) when a vector of hyperparameters is input as argument prior.

If a class "missmodel" object is input and argument prior is not given, then argument prior defaults to the prior probabilities specified in the call statement of the input "missmodel" object. If these are not specified, then the default (which depends on the algorithm) is used.
start
starting values of the parameters. This argument is not used by function completeCgm, but is included to conform with other missing data functions.
control
a list of parameters used to control the algorithm. This argument is not used by completeCgm, but is included to conform with other missing data functions.
contrasts
a list giving contrasts for some or all of the factors appearing in the design formula. The elements of the list should have the same name as the variable and should be either a contrast matrix (specifically, any full-rank matrix with as many rows as there are levels in the factor), or else a function to compute such a matrix given the number of levels.

VALUE:

an object of class "missmodel" is returned; see for details. The paramIter component is of class cgm, and is a matrix whose rows contain parameter estimates. The algorithm element contains an object of class "em".

DETAILS:

The completeCgm function computes Bayes estimates of the parameters in a multivariate normal model.

REFERENCES:

Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data, Chapman & Hall, London.

SEE ALSO:

, , , , , , , , .

EXAMPLES:

completeCgm(language[,c("LAN", "SEX", "HGPA", "FLAS")], 
            subset = !(is.na(SEX) | is.na(HGPA)), prior = 1) 
#Equivalent to: 
mdCgm(language[,c("LAN", "SEX", "HGPA","FLAS")], 
      subset = !(is.na(SEX) | is.na(HGPA)))