40 likes | 125 Views
Quality of Clusterings. Two metrics: SSE Dissimilarity Ratio. Computing SSE. Save clusters. Two new columns are created: Cluster and Distance . Create new column as formula. Name it as dist-sqr and define it as Distance 2
E N D
Quality of Clusterings • Two metrics: • SSE • Dissimilarity Ratio
Computing SSE • Save clusters. Two new columns are created: Cluster and Distance. • Create new column as formula. Name it as dist-sqr and define it as Distance2 • Analyze – Distribution for dist-sqr. Get the mean and multiply by N to obtain SSE
Computing Dissimilarity Ratio • Dissimilarity ratio = (inter-cluster distance / intra-cluster distance) • Inter-cluster distance is the smallest distance between centroids • Normalize centroid coordinates: • Coordinates are given in cluster output • Find mean and std dev for each dimension from histogram (distribution) output • Normalize each centroid coordinate: • (x - mean) /st dev • Compute distances between each pair of centroids: • Inter-cluster distance is given by the smallest of the normalized centroid distances
Dissimilarity Ratio – cont. • Intra-cluster distance is given by the average max dist of the clusters. • The max dist of each cluster is found at the clusters output in JMP. • Computer dissimilarity ratio (DR) for each clustering • The higher the DR the better the clustering.