impCgm.default(object, nimpute = 3, margins, gauss, design, optData, subset, prior = 0.5, start = NULL, iterOn1 = T, control = daCgm.control(), contrasts = NULL, return.type = "data.frame") impCgm.preCgm(object, nimpute = 3, margins, gauss, design, optData, prior = 0.5, start = NULL, iterOn1 = T, control = daCgm.control(), contrasts = NULL, return.type = "data.frame") impCgm.missmodel(object, nimpute = 3, margins, gauss, design, optData, prior = 0.5, start = NULL, iterOn1 = T, control = daCgm.control(), constrasts = NULL, return.type = "data.frame")
emCgm.default
: a data frame or matrix containing the raw data.
When a data frame is input and if the
margins
argument is not
provided, then the loglinear part of the model is assumed to be a
saturated model in which all
factor
variables are used to form the
table. If the
gauss
argument is not provided, then all numeric
variables in the data frame are included in the conditional Gaussian
part of the model.
When a matrix is input, you must provide the
margins
argument,
which identifies the variables to use in the discrete part
of the model. If the
gauss
argument is omitted, then all remaining
variables in the matrix are used in the Gaussian part of the
conditional Gaussian distribution.
impCgm.preCgm
, an object of class
"preCgm"
(produced by
the
preCgm
function).
impCgm.missmodel
, an object of class
"missmodel"
containing
the results of a previous analysis. Any of the functions
mdCgm
,
completeCgm
,
emCgm
, or
daCgm
may be used to produce the
missmodel
object.
nimpute
is ignored if several
chains are used to produce imputations,
in which case,
nimpute
is determined as
discussed in describing the argument
start
below.
list(1:2, 3:4)
would
indicate fitting the 1,2 margin (summing over variables 3 and 4) and
the 3,4 margin in a four-way table. This same model can be specified
using the names of the variables (e.g.,
list(c("V1", "V2"), c("V3", "V4"))
),
or using formula notation, as in
margins = ~V1:V2 + V3:V4
.
margins
is not specified, a saturated model is fitted.
impLoglin.default
: When a matrix is input as argument
object
,
argument
margins
must be specified.
When a data frame is input and argument
margins
is missing, then a saturated model involving all
factor variables is fitted.
impLoglin.missmodel
: If not given, argument
margins
defaults to
the margins specified in the
call
statement of the input
"missmodel"
object.
c(1, 2, 4)
; as a
vector of variable names, e.g.
c("V1", "V2", "V4")
; and using
formula notation, e.g.
~V1+V2+V4
. If argument
gauss
is
omitted, then all numeric variables (which do not appear in argument
margins
) are used in the multivariate gaussian model.
optData
. Optionally, an
ncell
by
m
matrix may be input directly as the design matrix.
i=1, ..., ncell
denote the cells in the loglinear model, and let
mu(i)
denote the vector of numeric variable means in cell
i
. Then
the formula
design
provides the design matrix for predicting the
cell means. As an example, let
"V1"
and
"V2"
be the names of the
factor variables, and let
"age"
be a vector giving an average age
for the subjects in each cell. Then formula
design=~V1+V2
indicates
a main effect model for the cell means, while
design=~V1 + V2 + age
indicates a main effect model for the cell means, adjusted for average
cell age.
ncell
by
m
matrix may be input. In this case, the
regression model is obtained as a linear function of the columns of the
input matrix.
design
is not specified, then the design matrix is taken to be an
identity matrix.
ncell
rows containing predictors to be used in
computing the
design
matrix. In the example given in the description
for argument
design
, the variable
age
would be input in argument
optData
.
object
is a data frame,
this expression may use variables in the data frame.
"priorLoglin"
,
or a vector of hyperparameters.
"ml"
(maximum likelihood),
"noninformative"
, and
"data.dependent"
. String matching is used,
so the characters
"m"
,
"n"
, or
"d"
are sufficient. The values
of the hyperparameters change with the algorithm (see
for details). E.g.
"noninformative"
means a common value of 1 for
EM, and a common value of 0.5 for DA.
"priorLoglin"
object is created by routine
priorLoglin
.
dataDepPrior
.
See
for details.
"noninformative"
.
impLoglin.missmodel
: If not given, argument
prior
defaults to
the prior probabilities specified in the
call
statement of the input
"missmodel"
object. If these are not specified, then the default
(which depends on the algorithm) is used.
start
depends on whether
the imputations are generated from one long chain, or from several chains.
mdCgm
are the cell means and
variance--covariance matrix of a multivariate Gaussian distribution,
and log-linear model cell probabilities.
start
is a list with matrix component
mu
giving the matrix of means in each of its
ncell
columns (where the
columns must be in the same order as the log-linear model cells, and
the rows must be in the same order as the continuous variables), a
matrix component
sigma
giving the variance-covariance matrix, and a
vector
pi
giving the cell probabilities. If structural zeros appear
in the contingency table,
start$pi
must contain zeros to indicate
the structural zeros; see
for details.
For one long chain, you must supply the argument
nimpute
.
start
may be a list of such lists, a class
"cgm"
object, or a list of
"cgm"
objects.
"cgm"
object is the
paramIter
component of a class
"missmodel"
object, produced by routines such as
mdCgm
,
daCgm
,
and
emCgm
. The number of imputations equals the number of rows in
the matrix
paramIter
.
"cgm"
objects are input, the estimates in the
final row of each
paramIter
component is used to start a chain. The
number of imputations equals the number of
"cgm"
objects.
1
s for
pi
(eventually normalized so they add to 1), and a matrix
of means and a diagonal matrix of variances estimated obtained from
the numeric observations with no missing values.
impCgm.missmodel
: If
start
,
margins
, and
gauss
are not
specified, then argument
start
defaults to the final estimates in
the input
"missmodel"
object. If either
margins
or
gauss
is
specified, then
start
must be provided. Also notice that when
argument
margins
is specified, care must be taken to ensure that
structural zeros in these final estimates are also structural zeros in
the new model.
iterOn1
is FALSE, then the
first imputation is drawn under the parameter given in
start
. If
iterOn1
is TRUE, then data augmentation starts from
start
, and
runs for
control$niter
iterations before producing the first
imputation. Each of the rest of the imputations are produced after
data augmentation runs for
control$niter
further iterations.
iterOn1
is FALSE, then the
imputations are drawn under the parameters given in the
start
matrix. If
iterOn1
is TRUE, then data augmentation starts from each
row of
start
, and runs for
control$niter
iterations before
producing each of the imputations.
impCgm.missmodel
: if not given, argument
control
defaults to
the control parameters specified in the
call
statement of the input
"missmodel"
object, but only if these are of the correct class. If
these are not given (or are not of the correcl class), then the
argument
control
defaults to the
daCgm.control
values.
design
formula. The elements of the list should have the same
name as the variable and should be either a contrast matrix
(specifically, any full-rank matrix with as many rows as there are
levels in the factor), or else a function to compute such a matrix
given the number of levels.
"data.frame"
(the default),
the returned object is
a data frame whose variables may inherit from class
"miVariable"
.
If
"matrix"
, then an
"miVariable"
containing a matrix is returned.
"miVariable"
objects,
or
"miVariable"
object containing a matrix, depending on
the value of
return.type
.Random.seed
if it does not already
exist, otherwise its value is updated.
Computations in the
impCgm
function are made more efficient
by first calculating a
preCgm
object. Therefore, if a
preCgm
object already exists (e.g. through using the
preCgm
function before calling
emCgm
or
daCgm
), then it will save
computation time to pass in this object instead of the original data.
See the help file for
for additional details.
Best, N. G., Cowles, M. K. and Vines, S. K. (1997),
CODA Convergence,
Diagnosis and Output Analysis Software for Gibbs sampling output ,
Version 0.4.,
Cambridge: Medical Research Council Biostatistics Unit.
Gilks, W. R., Richardson, S. and Spiegelhalter, D. J., editors (1996),
Markov Chain Monte Carlo in Practice ,
London: Chapman and Hall.
Schafer, J. L. (1997),
Analysis of Incomplete Multivariate Data ,
Chapman & Hall, London.
# First generate starting values # Categorical variables LAN, AGE, PRI, SEX, GRD specify a 5 dimensional # contingency table with 4*5*5*2*5 = 1000 cells # Specify loglinear model with all main effects and 2-variable associations: margins.form <- ~ LAN + AGE + PRI + SEX + GRD + LAN:AGE + LAN:PRI + LAN:SEX + LAN:GRD + AGE:PRI + AGE:SEX + AGE:GRD + PRI:SEX + PRI:GRD + SEX:GRD #linear contrast lc <- c(-2,-1,0,1,2) design.form <- ~ LAN + C(AGE,lc,1) + C(PRI,lc,1) + SEX + C(GRD,lc,1) language.pre <- preCgm(language) # Set hyperparameter to 1.05 to ensure a mode in the # interior of the parameter space language.em <- emCgm(language.pre, margins = margins.form, design = design.form, prior = 1.05) # 5 imputations produced by parallel chains, each # started from one row of a matrix of starting values, # and run for 100 iterations start.langEM <- matrix(rep(language.em$paramIter[2, ], 5), nrow = 5, byrow = T) language.imp <- impCgm(language, margins = margins.form, design = design.form, prior = 1.05, start = start.langEM, control = list(niter = 100)) # Single chain #The following are equivalent: impCgm.default(language, nimpute = 5, margins = margins.form, design = design.form, prior = 1.05, start = language.em$paramIter[2, ]) language.pre <- preCgm(data = language) impCgm.preCgm(object = language.pre, nimpute = 5, margins = margins.form, design = design.form, prior = 1.05, start = language.em$paramIter[2, ]) impCgm.missmodel(language.em, nimpute = 5)