Partitioning Around Medoids

Cluster analysis is the searching for groups (clusters) in the data in such a way that objects belonging to the same cluster resemble each other, whereas objects in different clusters are dissimilar.

The partitioning around medoids algorithm is similar to k-means but uses medoids rather than centroids. Partitioning around medoids has certain advantages: (1) it accepts a dissimilarity matrix; (2) it is more robust because it minimizes a sum of dissimilarities instead of a sum of squared euclidean distances; and (3) it provides novel graphical displays (silhouette plots and clusplots).

To perform partitioning around medoids

Choose Statistics __image\arrow5.gif Cluster Analysis __image\arrow5.gif Partitioning Around Medoids. The dialog shown below appears.

Model page

__image\medoid1.gif

In the Partitioning Around Medoids dialog, the Model page has the following options:

Data

Data Set

Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.

Clustering Variables

Select numeric variables from the dropdown list. If your data set contains factor variables, use the Compute Dissimilarities dialog to create dissimilarity objects to be used in the cluster analysis. However dissimilarity objects cannot be used in K-Means or Monothetic clustering.

Subset Rows

Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.

Omit Rows with Missing Values

Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.

Dissimilarity Object

If your data set contains non-numeric columns (for example, factors), use the Compute Dissimilarities dialog to produce a dissimilarity object, and then use this object in clustering. The Compute Dissimilarities dialog provides special options for handling factors. (To open this dialog, choose Statistics __image\ebd_ebd67.gif Cluster Analysis __image\ebd_ebd68.gif Compute Dissimilarities from the main menu.)

Use Dissimilarity Object Select this to use a dissimilarity object in the analysis.

Saved Object Specify a dissimilarity object.

Dissimilarity Measure

Metric Select the metric to be used for calculating dissimilarities between objects. The available options are euclidean and manhattan. Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences. If Data Set is already a dissimilarity matrix, then this argument is ignored.

Standardize Variables Select this to standardize each data column by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. If Data Set is already a dissimilarity matrix, then this argument is ignored.

Options

Number of Clusters

Specify the number of clusters to form when generating cluster membership indices, or a matrix of initial values for cluster centers.

Use Large Data Algorithm

Select this to use the large data algorithm known as clara. Note that this algorithm involves drawing subsamples of rows, so it cannot be used on dissimilarities.

Number of Samples

Enter the number of samples to be drawn from the data set. This is used in the large data algorithm.

Sample Size

Enter the number of observations in each sample. Sample Size should be higher than the Number of Clusters and at most the number of observations (nrow(x)). This is used in the large data algorithm.

Save Model Object

In the Save As field, enter the name for the object in which to save the results of the analysis. If an object with this name already exists, its contents are overwritten. The model object can be used in later functions such as plotting.

Save Data

Select this box to store a copy of the data in the model object. This is necessary if you wish to produce a clusplot for the model.

Save Dissimilarities

Select this box to store a copy of the dissimilarities in the model object. This is necessary if you wish to produce a clusplot for the model.

Results page

__image\medoid2.gif

In the Partitioning Around Medoids dialog, the Results page has the following options:

Printed Results

Output Type

Select None for no printed output, Short for a short printed summary, or Long for a more detailed printed summary. (Long output is not available for all functions.)

Save In

Specify the name of a data set in which to save cluster membership if Cluster Membership is selected.

Cluster Membership

Select this to save a vector of indices giving cluster memberships in the specified data set.

Plot page

__image\medoid3.gif

In the Partitioning Around Medoids dialog, the Results page has the following options:

Clusplot

Select this to create a clusplot for the clustering. This plot represents the data in a bivariate plot. Ellipses are drawn to indicate the clusters.

Silhouette Plot

Select this to create a silhouette plot for the clustering. This plot indicates the strength of cluster membership for each observation.

Related programming language functions

pam