partition
object.
plot.partition(x, ask=F, which.plots=NULL, ...)
"partition"
, e.g. created by the functions
pam
,
clara
, and
fanny
.
plot.partition
operates
in interactive mode.
clusplot.default
(except for the
diss
option)
may also be supplied to this function.
Graphical parameters may also be supplied as arguments to
this function (see
).
Clusplot
When
ask=T
, rather than producing each plot sequentially,
plot.partition
displays a menu listing all the plots that can be produced.
If the menu is not desired but a pause between plots is still wanted
one must set
par(ask=T)
before invoking the plot command.
The clusplot of a cluster partition consists of a two-dimensional
representation of the observations, in which the clusters are
indicated by ellipses. (See clusplot.partition for more details.)
The silhouette plot of a nonhierarchical clustering is fully described in
Rousseeuw (1987) and in chapter 2 of Kaufman and Rousseeuw (1990).
For each observation i, a bar is drawn, representing the silhouette width s(i)
of the observation. Observations are grouped per cluster, starting with
cluster 1 at the top. Observations with a large s(i) (almost 1) are very well
clustered, a small s(i) (around 0) means that the observation lies between
two clusters, and observations with a negative s(i) are probably placed in
the wrong cluster.
A clustering can be performed for several values of
k
(the number of
clusters). Finally, choose the value of
k
with the largest overall
average silhouette width.
The silhouette width is computed as follows:
Put a(i) = average dissimilarity between i and all other points of the
cluster to which i belongs. For all clusters C, put d(i,C) = average
dissimilarity of i to all observations of C. The smallest of these d(i,C) is
denoted as b(i), and can be seen as the dissimilarity between i and its
neighbor cluster. Finally, put s(i) = ( b(i) - a(i) ) / max( a(i), b(i) ).
The overall average silhouette width is then simply the average of s(i) over
all observations i.
In the silhouette plot, observation labels are only printed when the number of observations is limited to less than 40, for readability. Moreover, observation labels are truncated to at most 5 characters.
Kaufman, L. and Rousseeuw, P. J. (1990).
Finding Groups in Data: An Introduction to Cluster Analysis.
Wiley, New York.
Rousseeuw, P.J. (1987).
Silhouettes: A graphical aid to the interpretation
and validation of cluster analysis.
J. Comput. Appl. Math.,
20, 53-65.
Struyf, A., Hubert, M. and Rousseeuw, P. J. (1997).
Integrating robust clustering techniques in S-PLUS.
Computational Statistics and Data Analysis,
26, 17-37.
# generate 25 objects, divided into 2 clusters. x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)), cbind(rnorm(15,5,0.5), rnorm(15,5,0.5))) plot(pam(x, 2))