k
clusters.
fanny(x, k, diss=F, metric="euclidean", stand=F, save.x=T,
save.diss=T)
diss argument.
x is typically the output of
daisy or
dist.
Also a vector with length n*(n-1)/2 is allowed
(where n is the number of observations),
and will be interpreted in the same way as the output
of the above-mentioned functions.
Missing values (NAs) are not allowed.
x will be
considered as a dissimilarity matrix.
If FALSE, then
x will be
considered as a matrix of observations by variables.
x is already a dissimilarity matrix, then this argument will
be ignored.
x are standardized
before calculating the dissimilarities.
Measurements are standardized for each variable (column),
by subtracting the variable's mean value and dividing
by the variable's mean absolute deviation.
If
x is already a dissimilarity matrix,
then this argument will be ignored.
"fanny"
representing the clustering.
See
fanny.object for details.
In a fuzzy clustering, each observation is "spread out" over the various
clusters. Denote by u(i,v) the membership of observation i to cluster v.
The memberships are nonnegative, and for a fixed observation i they sum to 1.
The particular method
fanny stems from chapter 4 of
Kaufman and Rousseeuw (1990).
Compared to other fuzzy clustering methods,
fanny has the following
features: (a) it also accepts a dissimilarity matrix; (b) it is
more robust to the
spherical cluster assumption; (c) it provides
a novel graphical display, the silhouette plot (see
plot.partition).
Fanny aims to minimize the objective function
n n 2 2
k sum sum u (i,v) u (j,v) d(i,j)
i=1 j=1
sum ------------------------------------
n 2
v=1 2 sum u (j,v)
j=1
where n is the number of observations, k is the number of clusters and
d(i,j) is the dissimilarity between observations i and j.
Cluster analysis divides a dataset into groups (clusters) of observations that
are similar to each other. Partitioning methods like
pam,
clara, and
fanny
require that the number of clusters be given by the user.
Hierarchical methods like
agnes,
diana, and
mona construct a
hierarchy of clusterings, with the number of clusters ranging from one to
the number of observations.
Kaufman, L. and Rousseeuw, P. J. (1990).
Finding Groups in Data: An Introduction to Cluster Analysis.
Wiley, New York.
Struyf, A., Hubert, M. and Rousseeuw, P. J. (1997).
Integrating robust clustering techniques in S-PLUS.
Computational Statistics and Data Analysis,
26, 17-37.
# generate 25 objects, divided into two clusters,
# and 3 objects lying between those clusters.
x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)),
cbind(rnorm(3,3.5,0.5), rnorm(3,3.5,0.5)))
fannyx <- fanny(x, 2)
fannyx
summary(fannyx)
plot(fannyx)
fanny(daisy(x, metric="manhattan"), 2, diss=T)