diana(x, diss=F, metric="euclidean", stand=F, save.x=T, save.diss=T)
diss
argument.
x
is typically the output of
daisy
or
dist
. Also a vector with length n*(n-1)/2 is allowed (where n is the
number of observations), and will be interpreted in the same way as the output
of the above-mentioned functions. Missing values (NAs) are not allowed.
x
will be
considered as a dissimilarity matrix.
If FALSE, then
x
will
be considered as a matrix of observations by variables.
x
is already a dissimilarity matrix,
then this argument will be ignored.
x
are standardized before calculating the dissimilarities.
Measurements are standardized for each variable (column),
by subtracting the variable's mean value and dividing by
the variable's mean absolute deviation.
If
x
is already a dissimilarity matrix,
then this argument will be ignored.
"diana"
representing the clustering.
See diana.object for details.
diana
is fully described in chapter 6 of
Kaufman and Rousseeuw (1990).
It is probably unique in computing a divisive hierarchy,
whereas most other software for hierarchical clustering is agglomerative.
Moreover,
diana
provides
(a) the divisive coefficient
(see
diana.object
)
which measures the amount of clustering structure found;
and (b) the banner, a novel graphical display
(see
plot.diana
).
The
diana
-algorithm constructs
a hierarchy of clusterings,
starting with one large cluster containing all n observations.
Clusters are divided until each cluster contains only a single observation.
At each stage, the cluster with the largest diameter is selected.
(The diameter of a cluster is the largest dissimilarity between any
two of its observations.)
To divide the selected cluster,
the algorithm first looks for its most disparate observation
(i.e., which has the largest average dissimilarity to the other
observations of the selected cluster).
This observation initiates the "splinter group".
In subsequent steps, the algorithm reassigns observations
that are closer to the "splinter group" than to the "old party".
The result is a division of the selected cluster into two new clusters.
Cluster analysis divides a dataset into groups (clusters) of observations
that are similar to each other.
Hierarchical methods like
agnes
,
diana
, and
mona
construct a hierarchy of clusterings,
with the number of clusters ranging from one to the number of observations.
Partitioning methods like
pam
,
clara
, and
fanny
require that the number of clusters be given by the user.
Kaufman, L. and Rousseeuw, P. J. (1990).
Finding Groups in Data: An Introduction to Cluster Analysis.
Wiley, New York.
Struyf, A., Hubert, M. and Rousseeuw, P. J. (1997).
Integrating robust clustering techniques in S-PLUS.
Computational Statistics and Data Analysis,
26, 17-37.
dia1 <- diana(votes.repub, metric="manhattan", stand=T) print(dia1) plot(dia1) dia2 <- diana(daisy(votes.repub), diss=T) plot(dia2) diana(dist(votes.repub), diss=T)