Describe Missing Data Patterns

DESCRIPTION:

Describe patterns of missing data.

USAGE:

miss(x, sort = T) 

REQUIRED ARGUMENTS:

x
vector, matrix, or dataframe.

OPTIONAL ARGUMENTS:

sort
if TRUE or "rc" then rows and columns are first sorted by the number of missing values, then reordered to put columns and rows with similar missingness patterns together (see below for details). If "c" then just columns are sorted, if "r" then just rows are sorted, and if FALSE then neither are sorted. If "R" or "Rc" then rows are reordered to put similar rows together, but without first sorting by the number missing in each row.

VALUE:

an object of class "miss" with components:
n
number of observations (rows of x).
var.mis
number of missing values in each variable (column of x).
pattern
matrix with one column for each variable in x, and one row for each unique pattern of missing values. The ordering is determined by the sort argument.
rep.pattern
vector with one element for each row in pattern, indicating the number of rows of x with that pattern of missing values.
row.order
permutation vector to reorder rows of x so that all rows with the first missingness pattern are first, rows with the second pattern are next, etc.
column.order
permutation vector to reorder columns. This is present only if sort is TRUE or contains "c".

NOTE:

miss requires that missing data are denoted by NAs or NaNs.

DETAILS:

The result of this function is normally printed by print.miss , which provides a formatted display. In order to see all components of the result use print.default(miss(x)).

Columns are sorted first by the number missing in the column. Then, among columns with equal number missing, columns are reordered to form two groups, such that columns which have nonmissing data in the row with the most nonmissing observations are in the first group. These groups are reordered, recursively, according to missingness in the row with the second (third, ...) most nonmissing observations. Rows with the same number of nonmissing observations are used in their original order.

After columns are sorted (or not sorted), rows are sorted in much the same way the columns were, first by the number missing in the row (this step is optional), then by missingness in the first column, second, etc. In other words, rows with the same number of missing values are sorted in the order of the first occurence(s) of a missing value(s). As an example, suppose rows x[i,] and x[j,] both have k missing values. Let x[i,im] and x[j,jm] be the first missing values in row i and j respectively. Then x[i,] is placed before x[j,] if im > jm. If im == jm then the position of the next missing value is considered, etc.

SEE ALSO:

, .

EXAMPLES:

x <- longley.x; x[runif(96) > .9] <- NA  # random missing data 
M <- miss(x) 
M                      # equivalent to print(M) or print.miss(M) 
print.default(M)       # print all components, no special format 
print(M, all.obs=F)    # omit last part of printout 
plot(M) 
# Other information about missing values can be obtained using e.g.: 
rowSums(is.na(x))                  # number missing in each row 
rowSums(!is.na(x))                 # number not missing in each row 
round(100 * colMeans(is.na(x)), 1) # percent missing by column 
round(cor(is.na(x)), 2)            # correlation of missingness patterns 
# Missing value codes other than NA should be changed to NA, e.g. 
x[x == -9] <- NA                   # Do this before calling miss(x).