k
clusters.
fanny(x, k, diss=F, metric="euclidean", stand=F, save.x=T, save.diss=T)
diss
argument.
x
is typically the output of
daisy
or
dist
.
Also a vector with length n*(n-1)/2 is allowed
(where n is the number of observations),
and will be interpreted in the same way as the output
of the above-mentioned functions.
Missing values (NAs) are not allowed.
x
will be
considered as a dissimilarity matrix.
If FALSE, then
x
will be
considered as a matrix of observations by variables.
x
is already a dissimilarity matrix, then this argument will
be ignored.
x
are standardized
before calculating the dissimilarities.
Measurements are standardized for each variable (column),
by subtracting the variable's mean value and dividing
by the variable's mean absolute deviation.
If
x
is already a dissimilarity matrix,
then this argument will be ignored.
"fanny"
representing the clustering.
See
fanny.object
for details.
In a fuzzy clustering, each observation is "spread out" over the various
clusters. Denote by u(i,v) the membership of observation i to cluster v.
The memberships are nonnegative, and for a fixed observation i they sum to 1.
The particular method
fanny
stems from chapter 4 of
Kaufman and Rousseeuw (1990).
Compared to other fuzzy clustering methods,
fanny
has the following
features: (a) it also accepts a dissimilarity matrix; (b) it is
more robust to the
spherical cluster
assumption; (c) it provides
a novel graphical display, the silhouette plot (see
plot.partition
).
Fanny aims to minimize the objective function
n n 2 2
k sum sum u (i,v) u (j,v) d(i,j)
i=1 j=1
sum ------------------------------------
n 2
v=1 2 sum u (j,v)
j=1
where n is the number of observations, k is the number of clusters and
d(i,j) is the dissimilarity between observations i and j.
Cluster analysis divides a dataset into groups (clusters) of observations that
are similar to each other. Partitioning methods like
pam
,
clara
, and
fanny
require that the number of clusters be given by the user.
Hierarchical methods like
agnes
,
diana
, and
mona
construct a
hierarchy of clusterings, with the number of clusters ranging from one to
the number of observations.
Kaufman, L. and Rousseeuw, P. J. (1990).
Finding Groups in Data: An Introduction to Cluster Analysis.
Wiley, New York.
Struyf, A., Hubert, M. and Rousseeuw, P. J. (1997).
Integrating robust clustering techniques in S-PLUS.
Computational Statistics and Data Analysis,
26, 17-37.
# generate 25 objects, divided into two clusters, # and 3 objects lying between those clusters. x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)), cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)), cbind(rnorm(3,3.5,0.5), rnorm(3,3.5,0.5))) fannyx <- fanny(x, 2) fannyx summary(fannyx) plot(fannyx) fanny(daisy(x, metric="manhattan"), 2, diss=T)