390 likes | 574 Views
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it. K-means. Criteria. Same criteria with multivariate data:. Justifying the criteria. Anova: decomposition of the variance.
E N D
Partition Methods Divide data into disjoint clusters • Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it.
Justifying the criteria • Anova: decomposition of the variance. Univariate: SST=SSW+SSB Multivariate: Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).
Problems of k-means • Very sensitive to outliers • Euclidean distances not appropriate for eliptical clusters • It does not give the number of clusters.
Problems of hierarchical cluster • If n is large, slow. Each time n(n-1)/2 comparisons. • Euclidean distances not always appropriate • If n is large, dendogram difficult to interpret