Cluster analysis is the searching for groups (clusters) in the data in such a way that objects belonging to the same cluster resemble each other, whereas objects in different clusters are dissimilar.
Hierarchical algorithms proceed by combining or dividing existing groups, producing a hierarchical structure displaying the order in which groups are merged or divided. Divisive methods start with all observations in a single group and proceed until each observation is in a separate group.
To perform divisive hierarchical clustering
Choose Statistics Cluster Analysis
Divisive Hierarchical. The dialog shown below appears.
Model page
In the Divisive Hierarchical Clustering dialog, the Model page has the following options:
Data
Specify a data set or a dissimilarity object. To use a subset of rows or columns, use standard S-PLUS subscripting of the data set.
Clustering Variables
Select numeric variables from the dropdown list. If your data set contains factor variables, use the Compute Dissimilarities dialog to create dissimilarity objects to be used in the cluster analysis. However dissimilarity objects cannot be used in K-Means or Monothetic clustering.
Enter an S-PLUS expression that identifies the rows to use in the analysis. To use all the rows in the data set, leave this field blank.
Select this box to omit from the analysis any rows in the data set that contain missing values for any of the variables in the model.
Dissimilarity Object
If your data set contains non-numeric columns (for example, factors), use the Compute Dissimilarities dialog to produce a dissimilarity object, and then use this object in clustering. The Compute Dissimilarities dialog provides special options for handling factors. (To open this dialog, choose Statistics Cluster Analysis
Compute Dissimilarities from the main menu.)
Use Dissimilarity Object Select this to use a dissimilarity object in the analysis.
Saved Object Specify a dissimilarity object.
Dissimilarity Measure
Metric Select the metric to be used for calculating dissimilarities between objects. The available options are euclidean and manhattan. Euclidean distances are root sum-of-squares of differences, and manhattan distances are the sum of absolute differences. If Data Set is already a dissimilarity matrix, then this argument is ignored.
Standardize Variables Select this to standardize each data column by subtracting the variable's mean value and dividing by the variable's mean absolute deviation. If Data Set is already a dissimilarity matrix, then this argument is ignored.
In the Save As field, enter the name for the object in which to save the results of the analysis. If an object with this name already exists, its contents are overwritten. The model object can be used in later functions such as plotting.
Save Data
Select this box to store a copy of the data in the model object. This is necessary if you wish to produce a clusplot for the model.
Save Dissimilarities
Select this box to store a copy of the dissimilarities in the model object. This is necessary if you wish to produce a clusplot for the model.
Results page
In the Divisive Hierarchical Clustering dialog, the Results page has the following options:
Printed Results
Output Type
Select None for no printed output, Short for a short printed summary, or Long for a more detailed printed summary. (Long output is not available for all functions.)
Save In
Specify the name of a data set in which to save cluster membership if Cluster Membership is selected.
Cluster Membership
Select this to save a vector of indices giving cluster memberships in the specified data set.
Number of Clusters
Specify the number of clusters to form when generating cluster membership indices, or a matrix of initial values for cluster centers.
Plot page
In the Divisive Hierarchical Clustering dialog, the Plot page has the following options:
Plots
Clustering Tree
Select this to create a plot of a hierarchical clustering tree indicating the order in which groups were split or combined. The leaves of the clustering tree are the original observations. A branch splits up at the diameter of the cluster being split.
Banner Plot
Select this to create a banner plot. The banner plot displays the hierarchy of clusters, and is equivalent to a tree. The banner plots the diameter of each cluster being split.
Related programming language functions
diana