mdLoglin(object, frequency, margins, subset, prior = <<see below>>, na.proc = "fail", start = <<see below>>, control)
"preLoglin"
or
"missmodel"
object, or data frame or
matrix
containing the raw data. When a data frame is input,
the table is specified by the levels of the factor
variables. When a matrix is input, it is assumed that the
levels of a variable form a sequence of integers from one to the maximum
value of the variable. Using a class
"preLoglin"
object
shortens the total compute time if
mdLoglin
is called
more than once. Use routine
preLoglin
to create this object.
object
. If
omitted, all frequencies are assumed to be 1 (unless specified in
argument
margins
). This argument is not used if argument
object
is a class
"preLoglin"
or
"missmodel"
object.
If
object
is a data frame and this is the
(unquoted) name of a variable in the data frame, then that variable is used.
list(1:2, 3:4)
would indicate fitting
the 1,2 margin (summing over variables 3 and 4) and the 3,4 margin in
a four-way table. This same model can be specified using the names of
the variables (e.g.,
list(c("V1", "V2"), c("V3", "V4"))
), or using
formula notation, as in
margins = ~V1:V2 + V3:V4
. When formula
notation is used, the argument
frequency
can be included as the
dependent variable (as in
margins = frequency~V1:V2 + V3:V4
).
margins
is not specified, a saturated model is fitted. When
the argument
object
is a matrix, a saturated model is defined as
a model with a single interaction term that includes every column in
the data matrix. When a data frame is input, a
saturated model includes all factor variables in the single
interaction term. Cell counts in the table are determined by the
frequency
variable.
object
is a data frame,
this expression may use variables in the data frame.
This argument is not used if argument
object
is a class
"preLoglin"
or
"missmodel"
object.
"priorLoglin"
, or an array of
hyperparameters.
"ml"
(maximum likelihood) or
"noninformative"
.
String matching is used,
so the characters
"m"
or
"n"
are sufficient. The values
of the hyperparameters changes with the algorithm (see
for details). E.g.
"noninformative"
means a common value of 1 for
EM, and a common value of 0.5 for DA.
"priorLoglin"
object is created by routine
priorLoglin
.
start
for the order to use in specifying a vector of
hyperparameters. If a single numeric value is input, its value is
replicated for all cells in the table.
The hyperparameters for a data dependent prior (following an
independence model) can be generated using routine
dataDepPrior
.
See
for details.
"noninformative"
. When a class
"missmodel"
object is input, any value specified in a previous call has priority
over the default value (but not over any currently specified value).
NA
) when a vector of
hyperparameters is input as argument
prior
.
mdLoglin
missing
values are only allowed in the variables whose levels define the table
to be analyzed -- missing values in the frequency variable are not
allowed.
Possible values are:
stop with an error message if missing values are encountered,
object
is a class
"preLoglin"
or
"missmodel"
object, argument
na.proc
must be either
"da"
or
"em"
.
mdLoglin
are the cell probabilities. Thus,
start
is a vector with
length equal to the total number of cells in the table and containing a
probability estimate for each cell in the table. Starting values for
cells that are structural zeros in the table should be zero. Suppose
that the table is defined by the variables
X1
,
X2
, and
X3
. Then
the cells in the table are ordered such that the index for variable
X1
varies fastest, the index for variable
X2
varies next fastest,
etc.
object
is a class
"missmodel"
object. In this case, if the
argument
margins
is not specified, then argument
start
defaults to the final estimates in the input
"missmodel"
object. If
argument
margins
is specified, then argument
start
must be provided.
"missmodel"
object is input and argument
margins
is specified, care must be taken to ensure that structural
zeros in these final estimates are also structural zeros in the new model.
emLoglin.control
values, or to the
daLoglin.control
values as appropriate. See the help files for
and
for details.
"missmodel"
object is input, the control values
specified on a previous call has priority over the default values (but
not over any currently specified value), but only if these are of the
required class (
"da"
or
"em"
).
"missmodel"
is returned; see
for details.
mdLoglin
causes creation of the data set
.Random.seed
if it does not already exist, otherwise its value is updated.
The
mdLoglin
function estimates the cell probabilities in
hierarchical log-linear models. A hierarchical log-linear model
predicts the log of the cell probabilities for
the multinomial as a linear factorial model. In a hierarchical model
the inclusion of an interaction effect automatically means that all
associated lower level effects are included in the model. For example,
for factors
A
,
B
, and
C
, inclusion of
A:B:C
automatically
means that
A
,
B
,
C
,
A:B
,
A:C
, and
B:C
are also included in
the model.
mdLoglin
provides several methods for handling missing values. The
EM algorithm computes the modes of the posterior probability
distribution, given the specifed Dirichlet prior (when all Dirichlet
parameters are one, maximum likelihood estimates are computed).
Alternatively, the data augmentation algorithm uses Markov Chain Monte
Carlo (MCMC) methods to alternately simulate data for the missing
values, and parameter estimates. With this method, care must be taken
to ensure that the Markov Chain has reached a steady state. The
sequence of estimates should be analyzed to diagnose convergence.
When the
mdLoglin
function is called more than once, it is
preferable to precompute many of the statistics used by
mdLoglin
. This may be done using the
preLoglin
function.
Agresti, A. (1990),
Categorical Data Analysis ,
John Wiley & Sons, New York.
Bishop, Y. M. M., Fienberg, S. E., and Holland, H. W.,
Discrete Multivariate Analysis: Theory and Practice ,
MIT Press, Cambridge.
Schafer, J. L. (1997),
Analysis of Incomplete Multivariate Data ,
Chapman & Hall, London.
mdLoglin(object = crime) # fails by default # because cholesterol has missing data # use EM to fit saturated model under a Jeffreys prior mdLoglin(object = cholesterol, margins=count~Visit.1:Visit.2, na.proc = "em", prior = 0.5) # same, but first create preGauss object for greater efficiency crime.pre <- preLoglin(crime, margins=count~Visit.1:Visit.2) crime.em <- mdLoglin(crime.pre, margins=~Visit.1:Visit.2, na.proc = "em", prior = 0.5) crime.em <- emLoglin(crime.pre, margins=~Visit.1:Visit.2, prior = 0.5) #same # Data augmentation: start with last parameter estimates # in crime.em, save iterates 1 to 5100 crime.da <- mdLoglin(crime.em, na.proc="da", control=list(save=1:5100)) crime.da <- daLoglin(crime.em, control=list(save=1:5100)) #same