mdCgm(object, margins, gauss, design, optData, subset, prior = <<see below>>, na.proc = "fail", start = NULL, control, contrasts = NULL)
"preCgm"
or a
"missmodel"
object, or a data frame or
matrix containing the raw data.
margins
argument is not
provided, then the loglinear part of the model is assumed to be a
saturated model in which all
factor
variables are used to form the
table. If the
gauss
argument is not provided, then all numeric
variables in the data frame are included in the conditional Gaussian
part of the model.
margins
argument,
which identifies the variables to use in the discrete part
of the model. If the
gauss
argument is omitted, then all remaining
variables in the matrix are used in the Gaussian part of the
conditional Gaussian distribution.
"missmodel"
object is input, then the
paramIter
component of the
"missmodel"
object must be of class
"cgm"
.
list(1:2, 3:4)
would
indicate fitting the 1,2 margin (summing over variables 3 and 4) and
the 3,4 margin in a four-way table. This same model can be specified
using the names of the variables (e.g., list(c("V1", "V2"), c("V3",
"V4"))), or using formula notation, as in
margins = ~V1:V2 + V3:V4
.
margins
is not specified, a saturated model is fitted.
data
, argument
margins
must be
specified. When a data frame is input and argument
margins
is
missing, then a saturated model involving all factor variables is fitted.
"missmodel"
object is input, then if
margins
is not
given, argument
margins
defaults to the margins specified in the
call
statement of the input
"missmodel"
object.
c(1, 2, 4)
, as a
vector of variable names, e.g.
c("V1", "V2", "V4")
, and using
formula notation, e.g.
~V1+V2+V4
. If argument
gauss
is
omitted, then all numeric variables (which do not appear in argument
margins
) are used in the multivariate gaussian model.
optData
. Optionally, an
ncell
by
m
matrix may be input directly as the design matrix.
i=1, ..., ncell
denote the cells in the loglinear model, and let
mu(i)
denote the vector of numeric variable means in cell
i
. Then
the formula
design
provides the design matrix for predicting the
cell means. As an example, let
"V1"
and
"V2"
be the names of the
factor variables, and let
"age"
be a vector giving an average age
for the subjects in each cell. Then formula
design=~V1+V2
indicates
a main effect model for the cell means, while
design=~V1 + V2 + age
indicates a main effect model for the cell means, adjusted for average
cell age.
ncell
by
m
matrix may be input. In this case, the
regression model is obtained as a linear function of the columns of the
input matrix.
design
is not specified, then the design matrix is taken to be an
identity matrix.
ncell
rows containing predictors to be used in
computing the
design
matrix. In the example given in the description
for argument
design
, the variable
age
would be input in argument
optData
.
object
is a data frame,
this expression may use variables in the data frame.
"priorLoglin"
,
or a vector of hyperparameters.
"ml"
(maximum likelihood),
"noninformative"
, and
"data.dependent"
. String matching is used,
so the characters
"m"
,
"n"
, or
"d"
are sufficient. The values
of the hyperparameters change with the algorithm (see
for details). E.g.
"noninformative"
means a common value of 1 for
EM, and a common value of 0.5 for DA.
"priorLoglin"
object is created by routine
priorLoglin
.
dataDepPrior
.
See
for details.
"noninformative"
. When a class
"missmodel"
object is input, any value specified in a previous call has priority
over the default value (but not over any currently specified value).
NA
) when a vector of
hyperparameters is input as argument
prior
.
"missmodel"
object is input and argument
prior
is not
given, then argument
prior
defaults to the prior probabilities
specified in the
call
statement of the input
"missmodel"
object. If these are not specified, then the default (which depends on
the algorithm) is used.
stop with an error message if missing values are encountered,
object
is a class
"preCgm"
or
"missmodel"
object, argument
na.proc
must be either "da" or "em".
"cgm"
object of starting values of the model
parameters. The parameters estimated by
mdCgm
are the cell means and
variance--covariance matrix of a multivariate Gaussian distribution,
and log-linear model cell probabilities.
start
may be a list with matrix component
mu
giving the
matrix of means in each of its
ncell
columns (where the columns must
be in the same order as the log-linear model cells, and the rows must
be in the same order as the continuous variables), a matrix component
sigma
giving the variance-covariance matrix, and a vector
pi
giving the cell probabilities. If structural zeros appear in the
contingency table,
start$pi
must contain zeros to indicate the
structural zeros; see
for details.
"cgm"
object created as the
paramIter
component of the class
"missmodel"
object may be input for the
starting values. Routines
mdCgm
,
daCgm
, and
emCgm
may be used to
create an appropriate
"missmodel"
object.
1
s for
pi
, and a matrix of means and a diagonal matrix of
variances calculated from the numeric observations with no
missing values.
object
is a class
"missmodel"
object,
start
defaults to the final estimates in the input
"missmodel"
object.
emCgm.control
values, or to the
daCgm.control
values as appropriate. See the help files for
and
for details.
"missmodel"
object is input, the control values
specified on a previous call has priority over the default values (but
not over any currently specified value), but only if these are of the
required class (
"da"
or
"em"
).
design
formula. The elements of the list should have the same
name as the variable and should be either a contrast matrix
(specifically, any full-rank matrix with as many rows as there are
levels in the factor), or else a function to compute such a matrix
given the number of levels.
"missmodel"
is returned; see
for details.
mdCgm
creates the data set
.Random.seed
if it does not already exist, otherwise update its value.
The
mdCgm
function estimates parameters of a conditional
Gaussian model (also known as a "general location model")
in which the factor variables are modeled according to
a hierarchical log-linear model, and, conditional upon the factor
variables, the distribution of the numeric variables is
multivariate normal. In hierarchical models the inclusion of an
interaction effect automatically means that all corresponding lower level
effects are included in the model. For example, for factors
A
,
B
, and
C
,
inclusion of
A:B:C
automatically means that
A
,
B
,
C
,
A:B
,
A:C
, and
B:C
are also included in the model.
mdCgm
handles missing values in one of four ways as indicated by the
argument
na.proc
.
A Dirichlet prior distribution may be specified for the parameters in
the log-linear model. A noninformative prior (see
) is
always assumed for the parameters in the multivariate normal
distribution.
Because the
emCgm
function is often called more than once, it is
usually preferable to precompute quantities used by
emCgm
. This may be done using the
preCgm
function.
Agresti, A. (1990),
Categorical Data Analysis ,
John Wiley & Sons, New York.
Bishop, Y. M. M., Fienberg, S. E., and Holland, H. W.,
Discrete Multivariate Analysis: Theory and Practice ,
MIT Press, Cambridge,
Schafer, J. L. (1997),
Analysis of Incomplete Multivariate Data ,
Chapman & Hall, London.
mdGauss(object = language) # fails by default # because language has missing data # Fit model on part of data with no missing values: mdCgm(language[,c("LAN", "SEX", "HGPA","FLAS")], subset=!(is.na(SEX) | is.na(HGPA))) # Equivalent to: completeCgm(language[,c("LAN", "SEX", "HGPA", "FLAS")], subset=!(is.na(SEX) | is.na(HGPA)),prior=1) mdCgm(object = stlouis[,-1], margins = ~D1:D2+risk, gauss = ~verbal1+verbal2, design = ~D1+D2+risk, na.proc = "em", subset = verbal2 > 100 | is.na(verbal2)) # Equivalent to: emCgm(object = stlouis[,-1], margins = ~D1:D2+risk, gauss = ~verbal1+verbal2, design = ~D1+D2+risk, subset = verbal2 > 100 | is.na(verbal2)) # PreProcess language.s <- preCgm(language) # Categorical variables LAN, AGE, PRI, SEX, GRD specify a 5 dimensional # contingency table with 4*5*5*2*5= 1000 cells. # Specify loglinear model with all main effects and 2-variable associations: margins.form <- ~ LAN + AGE + PRI + SEX + GRD + LAN:AGE + LAN:PRI + LAN:SEX + LAN:GRD + AGE:PRI + AGE:SEX + AGE:GRD + PRI:SEX + PRI:GRD + SEX:GRD # linear contrast lc <- c(-2,-1,0,1,2) # Set up contrasts to get dummy-coded design matrix options(contrasts= c("contr.treatment", "contr.poly")) design.form <- ~ LAN + C(AGE,lc,1) + C(PRI,lc,1) + SEX + C(GRD,lc,1) # Set hyperparameter to 1.05 to ensure a mode in the # interior of the parameter space language.em <- mdCgm(language.s, margins = margins.form, design = design.form, prior = 1.05, na.proc= "em") # same as: emCgm(language.s, margins = margins.form, design = design.form, prior = 1.05) # Data augmentation language.da <- mdCgm(language.em, control = list(niter = 1000, save = 100:1000), na.proc = "da") # same as: daCgm(language.em, control = list(niter = 1000, save = 100:1000))