K-Means Clustering

Cluster analysis is the searching for groups (clusters) in the data in such a way that objects belonging to the same cluster resemble each other, whereas objects in different clusters are dissimilar.

One of the most well-known partitioning methods is k-means. In the k-means algorithm, the observations are classified as belonging to one of k groups. Group membership is determined by calculating the centroid for each group (the multidimensional version of the mean) and assigning each observation to the group with the closest centroid.

To perform k-means clustering

Choose Statistics __image\arrow5.gif Cluster Analysis __image\arrow5.gif K-Means. The dialog shown below appears.

Model page

__image\kmeans1.gif

In the K-Means Clustering dialog, the Model page has the following options:

Data Set

Select a data set from the dropdown list or type the name of a data set. You can also type into the Data Set edit field any expression that evaluates to a data set.

Clustering Variables

Select numeric variables from the dropdown list. If your data set contains factor variables, use the Compute Dissimilarities dialog to create dissimilarity objects to be used in the cluster analysis. However dissimilarity objects cannot be used in K-Means or Monothetic clustering.

Subset Rows

Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.

Omit Rows with Missing Values

Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.

Options

Number of Clusters

Specify the number of clusters to form when generating cluster membership indices, or a matrix of initial values for cluster centers.

Maximum Iteration

Specify the number of iterations of the k-means algorithm. Each iteration alternates between calculating centroids and assigning observations to the cluster with the closest centroid.

Save Model Object

In the Save As field, enter the name for the object in which to save the results of the analysis. If an object with this name already exists, its contents are overwritten. The model object can be used in later functions such as plotting.

Results page

__image\kmeans2.gif

In the K-Means Clustering dialog, the Results page has the following options:

Printed Results

Output Type

Select None for no printed output, Short for a short printed summary, or Long for a more detailed printed summary. (Long output is not available for all functions.)

Save In

Specify the name of a data set in which to save cluster membership if Cluster Membership is selected.

Cluster Membership

Select this to save a vector of indices giving cluster memberships in the specified data set.

Related programming language functions

kmeans