Estimates for Loglinear Models

DESCRIPTION:

Estimates parameters for a loglinear model. There are four methods for handling missing values.

USAGE:

mdLoglin(object, frequency, margins, subset, prior = <<see below>>, 
       na.proc = "fail", start = <<see below>>, control) 

REQUIRED ARGUMENTS:

object
a class "preLoglin" or "missmodel" object, or data frame or matrix containing the raw data. When a data frame is input, the table is specified by the levels of the factor variables. When a matrix is input, it is assumed that the levels of a variable form a sequence of integers from one to the maximum value of the variable. Using a class "preLoglin" object shortens the total compute time if mdLoglin is called more than once. Use routine preLoglin to create this object.

OPTIONAL ARGUMENTS:

frequency
the frequency of the corresponding row in argument object. If omitted, all frequencies are assumed to be 1 (unless specified in argument margins). This argument is not used if argument object is a class "preLoglin" or "missmodel" object. If object is a data frame and this is the (unquoted) name of a variable in the data frame, then that variable is used.
margins
the marginal totals to be fit. A margin is described by the factors not summed over. Thus list(1:2, 3:4) would indicate fitting the 1,2 margin (summing over variables 3 and 4) and the 3,4 margin in a four-way table. This same model can be specified using the names of the variables (e.g., list(c("V1", "V2"), c("V3", "V4"))), or using formula notation, as in margins = ~V1:V2 + V3:V4. When formula notation is used, the argument frequency can be included as the dependent variable (as in margins = frequency~V1:V2 + V3:V4).

If margins is not specified, a saturated model is fitted. When the argument object is a matrix, a saturated model is defined as a model with a single interaction term that includes every column in the data matrix. When a data frame is input, a saturated model includes all factor variables in the single interaction term. Cell counts in the table are determined by the frequency variable.
subset
expression specifying which rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of rows), a numeric vector indicating the observation numbers to be included, or a character vector of the row names to be included. All observations are included by default. If object is a data frame, this expression may use variables in the data frame. This argument is not used if argument object is a class "preLoglin" or "missmodel" object.
prior
specifies Dirichlet prior hyperparameters. Supply either a character string, or an object of class "priorLoglin", or an array of hyperparameters.

Valid character strings are "ml" (maximum likelihood) or "noninformative". String matching is used, so the characters "m" or "n" are sufficient. The values of the hyperparameters changes with the algorithm (see for details). E.g. "noninformative" means a common value of 1 for EM, and a common value of 0.5 for DA.

A class "priorLoglin" object is created by routine priorLoglin.

See argument start for the order to use in specifying a vector of hyperparameters. If a single numeric value is input, its value is replicated for all cells in the table. The hyperparameters for a data dependent prior (following an independence model) can be generated using routine dataDepPrior. See for details.

The default value is "noninformative". When a class "missmodel" object is input, any value specified in a previous call has priority over the default value (but not over any currently specified value).

Structural zeros must be coded as missing ( NA) when a vector of hyperparameters is input as argument prior.
na.proc
character, the method to use in handling missing data. In mdLoglin missing values are only allowed in the variables whose levels define the table to be analyzed -- missing values in the frequency variable are not allowed. Possible values are:
"fail"

stop with an error message if missing values are encountered,

"omit"
omit observations with missing values,
"em"
use the EM algorithm, and
"da"
use a data augmentation algorithm.

When argument object is a class "preLoglin" or "missmodel" object, argument na.proc must be either "da" or "em".
start
starting values of the parameters. The parameters estimated by mdLoglin are the cell probabilities. Thus, start is a vector with length equal to the total number of cells in the table and containing a probability estimate for each cell in the table. Starting values for cells that are structural zeros in the table should be zero. Suppose that the table is defined by the variables X1, X2, and X3. Then the cells in the table are ordered such that the index for variable X1 varies fastest, the index for variable X2 varies next fastest, etc.

In most cases the default starting values are equal to one divided by the number of cells in the table. An exception occurs when argument object is a class "missmodel" object. In this case, if the argument margins is not specified, then argument start defaults to the final estimates in the input "missmodel" object. If argument margins is specified, then argument start must be provided.

Also notice that when a class "missmodel" object is input and argument margins is specified, care must be taken to ensure that structural zeros in these final estimates are also structural zeros in the new model.
control
A list of parameters used to control the algorithm. If not given, these default to the emLoglin.control values, or to the daLoglin.control values as appropriate. See the help files for and for details.

When a class "missmodel" object is input, the control values specified on a previous call has priority over the default values (but not over any currently specified value), but only if these are of the required class ( "da" or "em").

VALUE:

an object of class "missmodel" is returned; see for details.

SIDE EFFECTS:

The function mdLoglin causes creation of the data set .Random.seed if it does not already exist, otherwise its value is updated.

DETAILS:

The mdLoglin function estimates the cell probabilities in hierarchical log-linear models. A hierarchical log-linear model predicts the log of the cell probabilities for the multinomial as a linear factorial model. In a hierarchical model the inclusion of an interaction effect automatically means that all associated lower level effects are included in the model. For example, for factors A , B, and C, inclusion of A:B:C automatically means that A , B, C, A:B , A:C, and B:C are also included in the model.

mdLoglin provides several methods for handling missing values. The EM algorithm computes the modes of the posterior probability distribution, given the specifed Dirichlet prior (when all Dirichlet parameters are one, maximum likelihood estimates are computed). Alternatively, the data augmentation algorithm uses Markov Chain Monte Carlo (MCMC) methods to alternately simulate data for the missing values, and parameter estimates. With this method, care must be taken to ensure that the Markov Chain has reached a steady state. The sequence of estimates should be analyzed to diagnose convergence.

When the mdLoglin function is called more than once, it is preferable to precompute many of the statistics used by mdLoglin . This may be done using the preLoglin function.

REFERENCES:

Agresti, A. (1990), Categorical Data Analysis , John Wiley & Sons, New York.

Bishop, Y. M. M., Fienberg, S. E., and Holland, H. W., Discrete Multivariate Analysis: Theory and Practice , MIT Press, Cambridge.

Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data , Chapman & Hall, London.

SEE ALSO:

, , , , , , , , , , , .

EXAMPLES:

mdLoglin(object = crime)    # fails by default
                            # because cholesterol has missing data

# use EM to fit saturated model under a Jeffreys prior
mdLoglin(object = cholesterol, margins=count~Visit.1:Visit.2,
                     na.proc = "em", prior = 0.5)

# same, but first create preGauss object for greater efficiency
crime.pre <- preLoglin(crime, margins=count~Visit.1:Visit.2)
crime.em <- mdLoglin(crime.pre, margins=~Visit.1:Visit.2,
                     na.proc = "em", prior = 0.5)
crime.em <- emLoglin(crime.pre, margins=~Visit.1:Visit.2,
                     prior = 0.5)        #same

# Data augmentation: start with last parameter estimates
# in crime.em, save iterates 1 to 5100
crime.da <- mdLoglin(crime.em, na.proc="da",
                               control=list(save=1:5100))
crime.da <- daLoglin(crime.em, control=list(save=1:5100)) #same