Create a sparse kinship matrix

DESCRIPTION:

Compute the overall kinship matrix for a collection of families, and store it efficiently.

USAGE:

makekinship(famid, id, father.id, mother.id, father.id, unrelated=0)

REQUIRED ARGUMENTS:

famid
a vector of family identifiers
id
a vector of unique subject identifiers
father.id
for each subject, the identifier of their biological father
mother.id
for each subject, the identifier of their biological mother

OPTIONAL ARGUMENTS:

unrelated
subjects with this family id are considered to be unrelated singletons, i.e., not related to each other or to anyone else.

VALUE:

a sparse kinship matrix of class bdsmatrix

DETAILS:

For each family of more than one member, the kinship function is called to calculate a per-family kinship matrix. These are stored in an efficient way into a single block-diagaonal sparse matrix object, taking advantage of the fact that between family entries in the full matrix are all 0. Unrelated individuals are considered to be families of size 0, and are placed first in the matrix.

The final order of the rows within this matrix will not necessarily be the same as in the original data, since each family must be contiguous. The dimnames of the matrix contain the id variable for each row/column. Also note that to create the kinship matrix for a subset of the data it is necessary to create the full kinship matrix first and then subset it. One cannot first subset the data and then call the function. For instance, a call using only the female data would not detect that a particular man's sister and his daughter are related.

SEE ALSO:

kinship, makefamid

EXAMPLES:

# Data set from a large family study of breast cancer
#  there are 26050 subjects in the file, from 426 families
> table(cdata$sex)
     F     M 
 12699 13351
> length(unique(cdata$famid))
[1] 426

> kin1 <- makekinship(cdata$famid, cdata$gid, cdata$dadid, cdata$momid)
> dim(kin1)
[1] 26050 26050
> class(kin1)
[1] "bdsmatrix"
# The next line shows that few of the elements of the full matrix are >0
> length(kin1@blocks)/ prod(dim(kin1))
[1] 0.00164925

# kinship matrix for the females only
> femid <- cdata$gid[cdata$sex=='F']
> femindex <- !is.na(match(dimnames(kin1)[[1]], femid))
> kin2 <- kin1[femindex, femindex]
#
# Note that "femindex <- match(femid, dimnames(kin1)[[1]])" is wrong, since
#  then kin1[femindex, femindex] might improperly reorder the rows/cols 
#  (if families were not contiguous in cdata).  
# However sort(match(femid, dimnames(kin1)[[1]])) would be okay.