Diagnostics for "Missing Completely At Random"

DESCRIPTION:

Diagnostics to check whether the missingness patterns depend on the values of numerical variables.

USAGE:

mcar(x, miss.obj = miss(x),  
     mu = <<see below>>, cov = <<see below>>,  
     tol = sqrt(.Machine$double.eps)) 

REQUIRED ARGUMENTS:

x
matrix, or dataframe.

OPTIONAL ARGUMENTS:

miss.obj
the output of miss(x); if supplied this will save some processing
mu, cov
parameter estimates to use for Little's test. By default these are maximum likelihood estimates, obtained using emGauss. If only cov is supplied then mu is the generalized least squares (normal model maximum likelihood) estimate given the available data and cov.
tol
tolerance level for detecting negative eigenvalues and for detecting singular covariance matrices when producing a generalized inverse of the covariance matrix.

VALUE:

a list of class "mcar" with components:
n, var.mis, pattern, rep.pattern, row.order, column.order
these are produced by miss and are described in the help file for that function.
avg
matrix with a row for each pattern of missing values and a column for each numeric variable with enough non-missing observations. The matrix entries are the sample average of observed values of the numeric variables for rows matching the missingness pattern.
d2
Little's d-squared statistic, which has an asymptotic chi-square null distribution.
df.d2
degrees of freedom for Little's d-squared statistic.
Mahalanobis
vector with length equal to the number of patterns. These are normalized Mahalanobis distances which add to Little's d-squared statistic for testing "MCAR".
expected.Mahalanobis
approximate expected values for the Mahalanobis terms, if the null hypothesis of "MCAR" holds. These add to the degrees of freedom.
t.values, t.present, t.missing
matrices with one column for each numeric variable, and one row for each variable. For each pair of a numeric variable Y (with at least 4 non-missing observations) and another variable G, a two-sample pooled variance t-statistic is computed to compare the nonmissing values of Y, split into groups according to whether G is missing. Positive t-values indicate that the mean of Y is larger when G is present than when G is missing. The values of the t-statistics are in t.values, and the two group sizes in t.present and t.missing. The statistics are computed only if there are at least two observations in each group.

DETAILS:

The result of this function is normally printed by print.mcar , which provides a formatted display. To see all components of the result use print.default(mcar(x)).

The test will fail if any eigenvalues of cov are negative.

BACKGROUND:

This function provides diagnostic measures to check whether data are Missing Completely At Random (MCAR), which means (informally) that whether a variable is missing is independent of the values (either observed or missing) of other variables and of the variable itself; see the references below. The diagnostics here check whether the missingness of each variable depends upon the values of other numerical variables, but do not check whether there is dependence with the values of the variable itself, as occurs for example with censoring.

Little's d-squared is sum of normalized Mahalanobis distances between the overall mu and the sample means for each subset of the data consisting of rows with the same pattern of missing columns. If these distances are large it provides evidence of dependence between one or more numerical variables with whether one or more other variables are missing. Here a "normalized" Mahalanobis distance means that the covariance matrix used is the covariance matrix for the sample mean for each pattern, i.e. the matrix for a single observation divided by the number of observations in the pattern.

The t-values provide similar information for each pair of a numeric variable and another variable with missing values

If data are not MCAR then some methods for handling missing data (e.g. in var) give biased and inconsistent estimates.

REFERENCES:

Little, R. J. A. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Asssociation 38, 1198-1202.

Little, R. J. A., and Rubin, D. R. (1987). Statistical Analysis with Missing Data. Wiley, New York.

Hesterberg, Tim C. (1999). A Graphical Representation of Little's Test for MCAR. Technical Report No. 94, Research Department, Insightful Corporation. http://www.insightful.com/Hesterberg/articles/tech94-mi-little.pdf

Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall, London.

SEE ALSO:

, , , , , .

EXAMPLES:

x <- longley.x; x[runif(96) > .9] <- NA  # random missing data 
miss(x)                                  # view patterns of missing data 
M <- mcar(x) 
M 
print.default(M) 
plot(M)