mstree(x, plane=T)
plane=FALSE
, a vector as described by
mst
.
x
,
y
,
mst
and
order
)
describing the planing, minimal spanning tree, and two versions of lining.
nrow(x)--1
describing the edges in the
minimal spanning tree. The
i
th value in this vector is an
observation number, indicating that this observation and the
i
th observation should be linked in the minimal spanning
tree.
nrow(x)
by
2
, giving two types of ordering: The
first column presents the standard ordering from one extreme
of the minimal spanning tree to the other.
This starts on one end of a diameter and numbers the points in a certain
order so that points close together in Euclidean space tend to be close
in the sequence.
The second column presents the
radial ordering, based on distance from the center of the minimal spanning tree.
These can be used to detect clustering.
See below for graph theory definitions.
Multivariate planing uses the minimal spanning tree to represent multivariate data in two dimensions. The technique can also be used to explore the ability to discriminate between groups. See Friedman and Rafsky (1981) for more details. Results are currently computed to single-precision accuracy only.
A tree in graph theory is a set of
n
points and
n-1
edges (lines)
connecting points
such that there is a path along edges between any given points.
Because there are only
n-1
edges, such a path is unique (there are no
cycles).
A diameter of a tree is a path between two points with a maximal number of
edges; a center of a tree is a point as close to the middle of a diameter
as possible.
A minimal spanning tree is a tree that has the minimal sum of the lengths of
the edges. Such trees are useful in a variety of situations.
Friedman, J. H. and Rafsky, L. C. (1981).
Graphics for the Multivariate Two-Sample Problem.
Journal of the American Statistical Association
76, 277-287.
Reingold, E. M., Nievergelt, J. and Deo, N. (1977).
Combinatorial Algorithms: Theory and Practice.
Prentice-Hall, Englewood Cliffs, New Jersey.
plot(x, y) # plot original data mst <- mstree(cbind(x, y), plane=F) # minimal spanning tree # show tree on plot segments(x[seq(mst)], y[seq(mst)], x[mst], y[mst]) i <- rbind(iris[,,1], iris[,,2], iris[,,3]) tree <- mstree(i) # multivariate planing plot(tree, type="n") # plot data in plane text(tree, label=c(rep(1, 50), rep(2, 50), rep(3, 50))) # identify points # get the absolute value stress e distp <- dist(i) dist2 <- dist(cbind(tree$x, tree$y)) e <- sum(abs(distp - dist2))/sum(distp) # a function for multivariate planing of two samples mult.plane <- function(samp1, samp2, all = T) { if(ncol(samp1)!=ncol(samp2)) stop("Number of columns (variables) must match in mult.plane") tree <- mstree(rbind(samp1, samp2)) rowbreak <- nrow(samp1) plpoin <- c(tree$mst[1:rowbreak] > rowbreak, tree$mst[ - seq(rowbreak)] < rowbreak) plot(tree, type = "n") points(tree$x[seq(rowbreak)], tree$y[seq(rowbreak)], col = 2) points(tree$x[ - seq(rowbreak)], tree$y[ - seq(rowbreak)], col = 3) if(all) segments(tree$x[seq(tree$mst)], tree$y[seq(tree$mst)], tree$x[ tree$mst], tree$y[tree$mst], col = 4) segments(tree$x[plpoin], tree$y[plpoin], tree$x[tree$mst[plpoin]], tree$y[tree$mst[plpoin]]) } state.ms <- mstree(cbind(state.center$x, state.center$y), plane=F) usa() segments(state.center$x[seq(state.ms)], state.center$y[seq(state.ms)], state.center$x[state.ms], state.center$y[state.ms]) points(state.center, pch=18) title(main="Minimal Spanning Tree of State Centers")