130 likes | 386 Views
Cluster Analysis. Single Link Cluster Analysis Ward’s Minimum Sum of Squares k-Means Cluster Analysis SPSS TwoStep Cluster Analysis. Single-Link Clustering (most popular method). Cost (Importance). Right. Left. C. B. Single Link: Join item to cluster which has the
E N D
Cluster Analysis • Single Link Cluster Analysis • Ward’s Minimum Sum of Squares • k-Means Cluster Analysis • SPSS TwoStep Cluster Analysis
Single-Link Clustering (most popular method) Cost (Importance) . Right Left C . . . B Single Link: Join item to cluster which has the single closest member. Since B<q, join the star to the Left cluster, even though A>q and C>q. A q Complete Pain Relief (Importance)
Cluster AnalysisSingle Chain Agglomerative Procedure(most popular method) Part-Worth Coefficients of “Complete Pain Relief” D Therapy A Therapy B Therapies C Therapy E 5 9 10 15 2 Single Link: Join item to cluster which has the single closest member. First Stage: A= 2 B=5 C=9 D=10 E=15 Second Stage: AB= 3 BD=5 (Euclidian Distance) AC=6 BE=10 AD=8 CD= 1 AE=13 CE=6 BC= 4 DE=5 Third Stage: CDA=7 CDB=4 CDE=5 AB= 3 AE =13 BE =10 Fourth Stage: ABCD=4 ABE=10 CDE=5 Fifth Stage: ABCDE=5
Single Chain Agglomerative Clustering Output: Dendogram 5 4 3 1 A B C D E
Ward’s Clustering Strength (Importance) . Right Left D . . . C Ward’s Cluster: Join item to cluster which has the smallest distance ESS. In this case, if star is joined to left cluster, ESS=A2+B2+C2+D2 B A = mean location of points in proposed cluster Water Resistance (Importance)
Ward’s Minimum Variance Agglomerative Clustering Procedure First Stage: A= 2 B=5 C=9 D=10 E=15 Second Stage: AB= 4.5 BD=12.5 AC=24.5 BE=50.0 AD=32.0 CD= 0.5 AE=84.5 CE=18.0 BC= 8.0 DE=12.5 Third Stage: CDA=38.0 CDB=14 CDE=20.66 AB= 5.0 AE =85 BE =50.5 Fourth Stage: ABCD=41.0 ABE=93.17 CDE=25.18 Fifth Stage: ABCDE=98.8
Ward’s Minimum Variance Agglomerative Clustering Output 98.8 25.18 5 0.5 A B C D E
k-Means Clustering 1. Begin with two starting center points and allocate each item to nearest cluster center. 2. Recalculate center of clusters. Stop if center hasn’t changed. 3. Allocate items to nearest cluster center. Goto 2.
k-Means Clustering 1 4 A A B B 2 5 A A B B 3 A B
SPSS TwoStep Cluster Method • -scalable cluster analysis algorithm designed to handle • very large data sets. • can handle both continuous and categorical variables or attributes. • automatically select the number of clusters. Step 1: pre-cluster the cases (or records) into many small sub-clusters; Step 2: cluster the sub-clusters resulting from pre-cluster step into the desired number of clusters.