Impute Multivariate Normal Data

DESCRIPTION:

Methods for imputing numeric data under a multivariate normal model using data augmentation.

USAGE:

impGauss.default(object, nimpute = 3, subset, prior = <<see below>>, 
    start = <<see below>>, iterOn1 = T, control = daGauss.control(), 
    return.type = "data.frame") 
impGauss.preGauss(object, nimpute = 3, prior = <<see below>>, 
    start = <<see below>>, iterOn1 = T, control = daGauss.control(), 
    return.type = "data.frame") 
impGauss.missmodel(object, nimpute = 3, prior = <<see below>>, 
    start = <<see below>>, iterOn1 = T, control = daGauss.control(), 
    return.type = "data.frame") 

REQUIRED ARGUMENTS:

object
for impGauss.default: a data frame, or matrix, containing the raw data. When a data frame is input, the model is specified by the numeric variables. When a matrix is input, all variables are used.

for impGauss.preGauss: an object of class "preGauss" (produced by the preGauss function).

for impGauss.missmodel: an object of class "missmodel" containing the results of a previous analysis. Any of the functions mdGauss, completeGauss, emGauss, or daGauss may be used to produce the missmodel object.

OPTIONAL ARGUMENTS:

nimpute
an integer number of imputations. nimpute is ignored if several chains are used to produce imputations, in which case, nimpute is determined as discussed in describing the argument start below.
subset
expression specifying which rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of rows), a numeric vector indicating the observation numbers to be included, or a character vector of the row names to be included. All observations are included by default. If object is a data frame, this expression may use variables in the data frame.
prior
an object of class "priorGauss" giving the hyperparameters of the prior distribution. Routine priorGauss is used to create the class "priorGauss" object. Alternatively, the character strings "ml" (for no prior, i.e., maximum likelihood estimation), "noninformative" (for a noninformative prior), or "ridge" (for the default ridge prior) may be used. Pattern matching means that only the first character in the string is required. See for details.

The default value is a noninformative prior. When a class "missmodel" object is input, any value specified in a previous call has priority over the default value (but not over any currently specified value).
start
starting values of the parameters. The form of start determines whether the imputations are generated from one long chain, or from several chains.

For one long chain, start should be a list with vector component mu giving the mean and matrix component sigma giving the variance-covariance matrix.

For several chains, start may be a list of such lists, a class "Gauss" object, or a list of "Gauss" objects.

For a list of lists, each interior list must contain the two components mu and sigma. The number of imputations equals the length of the outermost list.

A class "Gauss" object is the paramIter component of a class "missmodel" object, produced by routines such as mdGauss, daGauss, and emGauss. This is a matrix with as many rows as there are imputations.

If a list of "Gauss" objects is input, the estimates in the final row of each paramIter component is used to start a chain. The number of imputations equals the number of "Gauss" objects.

In most cases the default starting values are equal to the mean and a diagonal variance-covariance matrix estimate obtained from the observations with no missing values. If an entire column is missing, the default mean for the column is zero, and the default variance for the column is one. Another exception occurs when argument object is a class "missmodel" object. In this case argument start defaults to the final estimates in the input "missmodel" object.
iterOn1
logical flag which determines whether the data augmentation algorithm is iterated before producing (1) the first imputation (in one long chain) or (2) each of the imputations (for parallel chains). The default value is TRUE.

In particular, for one long chain, if iterOn1 is FALSE, then the first imputation is drawn under the parameter given in start. If iterOn1 is TRUE, then data augmentation starts from start, and runs for control$niter iterations before producing the first imputation. Each of the rest of the imputations are produced after data augmentation runs for control$niter further iterations.

Similarly, for parallel chains, if iterOn1 is FALSE, then the imputations are drawn under the parameters given in the start matrix. If iterOn1 is TRUE, then data augmentation starts from each row of start, and runs for control$niter iterations before producing each of the imputations.
control
A list of parameters used to control the algorithm; see for details.

For impGauss.missmodel: if not given, argument control defaults to the control parameters specified in the call statement of the input "missmodel" object, but only if these are of the correct class. If these are not given (or are not of the correcl class), then the argument control defaults to the daGauss.control values.
return.type
character, if "data.frame" (the default), the returned object is a data frame whose variables may inherit from class "miVariable". If "matrix", then an "miVariable" containing a matrix is returned.

VALUE:

a data frame containing "miVariable" objects, or "miVariable" object containing a matrix, depending on the value of return.type

SIDE EFFECTS:

All methods create the data set .Random.seed if it does not already exist, otherwise its value is updated.

DETAILS:

Computations in the impGauss function are made more efficient by first calculating a preGauss object. Therefore, if a preGauss object already exists (e.g. through using the preGauss function before calling emGauss or daGauss ), then it will save computation time to pass in this object instead of the original data.

See the help file for for additional details.

REFERENCES:

Best, N. G., Cowles, M. K. and Vines, S. K. (1997), CODA Convergence, Diagnosis and Output Analysis Software for Gibbs sampling output , Version 0.4., Cambridge: Medical Research Council Biostatistics Unit.

Gilks, W. R., Richardson, S. and Spiegelhalter, D. J., editors (1996), Markov Chain Monte Carlo in Practice , London: Chapman and Hall.

Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data , Chapman & Hall, London.

SEE ALSO:

, , , , , , .

EXAMPLES:

# Draw 10 imputations from 10 parallel chains,
#   each started from the EM estimate and run for 100 iterations
cholesterol.em <- emGauss(cholesterol)
start <- matrix(rep(cholesterol.em$paramIter[2, ], 10), nrow = 10, byrow=T)
cholesterol.imp <- impGauss(cholesterol, start = start,
                            control = list(niter = 100))

#The following are equivalent:
impGauss.default(cholesterol, nimpute = 10,
                 start = cholesterol.em$paramIter[2, ])
cholesterol.pre <- preGauss(data = cholesterol)
impGauss.preGauss(object = cholesterol.pre, nimpute = 10,
                 start = cholesterol.em$paramIter[2, ])
impGauss.missmodel(cholesterol.em, nimpute = 10)