Log-Linear Models for Complete Data

DESCRIPTION:

Compute log-linear model estimates in tables containing no missing values.

USAGE:

completeLoglin(data, frequency, margins, subset, prior = 1, start = NULL, 
               control = emLoglin.control()) 

REQUIRED ARGUMENTS:

data
a data frame or matrix containing the raw data. When a data frame is input, the table is specified by the levels of the factor variables. When a matrix is input, it is assumed that the levels of a variable form a sequence of integers from one to the maximum value of the variable.

OPTIONAL ARGUMENTS:

frequency
the frequency of the corresponding row in argument data. If data is a data frame and this is the (unquoted) name of a variable in the data frame, then that variable is used. If omitted, all frequencies are assumed to be 1 (unless specified in argument margins).
margins
the marginal totals to be fit. A margin is described by the factors not summed over. Thus list(1:2, 3:4) would indicate fitting the 1,2 margin (summing over variables 3 and 4) and the 3,4 margin in a four-way table. This same model can be specified using the names of the variables (e.g., list(c("V1", "V2"), c("V3", "V4"))), or using formula notation, as in margins = ~V1:V2 + V3:V4. When formula notation is used, the argument frequency can be included as the dependent variable (as in margins = frequency~V1:V2 + V3:V4).

If margins is not specified, a saturated model is fitted. When a matrix is input as argument data, a saturated model is defined as a model with a single interaction term that includes every column in the data matrix. When a data frame is input, a saturated model includes all factor variables in the single interaction term. Cell counts in the table are determined by the frequency variable.
subset
expression specifying which rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of rows), a numeric vector indicating the observation numbers to be included, or a character vector of the row names to be included. All observations are included by default. If data is a data frame, this expression may use variables in the data frame.
prior
specifies Dirichlet prior hyperparameters. Supply either a character string, or an object of class "priorLoglin", or an array of hyperparameters.

Valid character strings are "ml" (maximum likelihood) and "noninformative". String matching is used, so the characters "m" or "n" are sufficient. The values of the hyperparameters changes with the algorithm (see for details). E.g. "noninformative" means a common value of 1 for EM, and a common value of 0.5 for DA.

A class "priorLoglin" object is created by routine priorLoglin.

See argument start for the order to use in specifying a vector of hyperparameters. If a single numeric value is input, its value is replicated for all cells in the table. The hyperparameters for a data dependent prior (following an independence model) can be generated using routine dataDepPrior. See for details.

The default value is "noninformative". When a class "missmodel" object is input, any value specified in a previous call has priority over the default value (but not over any currently specified value).

Structural zeros must be coded as missing (NA), when a vector of hyperparameters is input as argument prior.
start
starting values of the parameters. The parameters estimated by mdLoglin are the cell probabilities. Thus, start is an array with length equal to the total number of cells in the table and containing a probability estimate for each cell. Starting values for cells that are structural zeros in the table should be zero. The default starting values are all equal to one divided by the total number of cells in the table. Suppose that the table is defined by the variables X1, X2, and X3. Then the cells in the table are ordered such that the index for variable X1 varies fastest, the index for variable X2 varies next fastest, etc.
control
a list of parameters used to control the EM algorithm; see for details. The only control parameters relevant to this function are maxit, tolerance, and trace.

VALUE:

an object of class "missmodel" is returned; see for details. In the class "missmodel" object returned by completeLoglin, the paramIter component contains one or more rows of parameter estimates, and the algorithm element contains an object of class "em".

DETAILS:

The completeLoglin function computes estimates of the cell probabilities in hierarchical log-linear models. A hierarchical log-linear model is a multinomial model that predicts the log of the cell probabilities for the multinomial as a linear factorial model. In a hierarchical model the inclusion of an interaction effect automatically means that all dependent lower level effects are included in the model. For example, for factors A , B, and C, inclusion of A:B:C automatically means that A , B, C, A:B, A:C , and B:C are also included in the model.

REFERENCES:

Agresti, A. (1990), Categorical Data Analysis , John Wiley & Sons, New York.

Bishop, Y. M. M., Fienberg, S. E., and Holland, H. W., Discrete Multivariate Analysis: Theory and Practice , MIT Press, Cambridge, MA.

SEE ALSO:

, , , , , , .

EXAMPLES:

completeLoglin(data = na.omit(crime),
               margins = count~Visit.1:Visit.2)