1 / 28

Iterative Optimization and Simplification of Hierarchical Clusterings

Iterative Optimization and Simplification of Hierarchical Clusterings. Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence Research, 4 (1996) 147-179 Presented by: Biyu Liang. Outline. Introduction Generating Initial Hierarchical Clustering

lazar
Download Presentation

Iterative Optimization and Simplification of Hierarchical Clusterings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science,Vanderbilt University Journal of Artificial Intelligence Research,4 (1996) 147-179 Presented by: Biyu Liang

  2. Outline • Introduction • Generating Initial Hierarchical Clustering • Iterative Optimization Methods and Comparison • Simplification of Hierarchical Clustering • Conclusion

  3. Introduction • Clustering is a process of unsupervised learning, which groups objects into clusters. • Major Clustering Methods • Partitioning • Hierarchical • Density-based • Grid-based • Model-based

  4. Introduction (Continued) • Clustering systems differ in • objective function • control strategy • Usually a search strategy cannot be both computationally inexpensive and give any guarantee about the quality.

  5. Introduction (Continued) • This paper discusses the use of iterative optimization and simplification to construct clusters that satisfy both conditions: • High quality • Computationally inexpensive • The suggested method involves 3 steps: • Constructing a initial clustering inexpensively • Iterative optimization to improve the clustering • Retrospective simplification of the clustering

  6. Outline • Introduction • Generating Initial Hierarchical Clustering • Iterative Optimization Methods and Experiments • Simplification of Hierarchical Clustering • Conclusion

  7. Category Utility • CU(CK) = P(Ck)ij[P(Ai =Vij |CK)2 -P(Ai= Vij)2] • PU({C1, C2, … CN}) = k CU(CK)/N Where an observation is a vector of Vij along attributes(or variables) Ai • This measure rewards clusters Ck, that increases the predictability of Vijwithin Ck (i.e. P(Ai=Vij|Ck)) relative to their predictability in the population as a whole (i.e. P(Ai= Vij))

  8. P(Color=gre|C1)

  9. Hierarchical Sorting • Given an observation and current partition, evaluate the quality of the clusterings that result from • Placing the observation in each of the existing clusters • Creating a new cluster that only covers the new observation • Select the option that yields the highest quality score (PU)

  10. Outline • Introduction • Generating Initial Hierarchical Clustering • Iterative Optimization Methods and Comparison • Simplification of Hierarchical Clustering • Conclusion

  11. Iterative Optimization Methods • Reorder-resort (Cluster/2): seed selection, reordering, and re-clustering. • Iterative redistribution of single observation: moving single observation one by one. • Iterative hierarchical redistribution: moving clusters together with its sub-tree.

  12. Reorder-resort (k-mean) • k random seeds are selected, and k clusters are growing around these attractors • the centroids of the clusters are picked as new seeds, new clusters are growing • The process iterates until there is no further improvement in the quality of generated clustering

  13. Reorder-resort (k-mean) con’t • Ordering data to make consecutive observations dissimilar leads to good clusterings • Extracting biased “dissimilarity” ordering from the hierarchical clustering • Initial sorting, extraction dissimilarity ordering, re-clustering

  14. Iterative Redistribution of Single Observations • Moves single observations from cluster to cluster • A cluster contains only one observation is removed and its single observation is resorted • Iterate until two consecutive iterations yield the same clustering

  15. Single Observation Redistribution Variations • The ISODATA algorithmdetermines a target cluster for each observation but does not move the cluster until targets for all observations have been determined • A sequential version that moves each observation as its target is identified through sorting

  16. Iterative Hierarchical Redistribution • Takes large steps in the search for a better clustering • Remove and resorts sub-tree instead of single observation • Requires update variable value counts of ancestor clusters and host cluster

  17. Scheme • Given an existing hierarchical clustering, a recursive loop examines sibling clusters in the hierarchy in a depth first fashion. • An inner, iterative loop reclassifies each sibling based on the objective function. And repeats until two consecutive iterations lead to the same set of siblings.

  18. (Continued) • The recursive loop then turns its attention to the children of each of these remaining siblings. • Finally the leaves will be reached and resorted. • The recursive loop will be applied several times until there are no changes that occur from one pass to the next.

  19. Main findings from the experiments • Hierarchical redistribution achieves the highest mean PU scores in most cases • Reordering and re-clustering comes closest to hierarchical redistribution’s performance in all cases • Single-observation redistribution modestly improves an initial sort, and is substantially worse than the other two optimization methods

  20. Outline • Introduction • Generating Initial Hierarchical Clustering • Iterative Optimization Methods and Comparison • Simplification of Hierarchical Clustering • Conclusion

  21. Simplifying Hierarchical Clustering • Simplify hierarchical clustering and minimize classification cost • Minimize Error Rate • Validation set to identify the frontier of clusters for prediction of each variable • Node lies below the frontier of every variable would be pruned

  22. Validation • For each variable, Ai, the objects from the validation set are each classified through the hierarchical clustering with the value of variable Ai “masked” for purposes of classification. • At each cluster encountered during classification, prediction correct if the observation’s value for Ai is equal to the most frequent value for Ai at the cluster. • A Count of all correct predictions for each variable at a cluster is maintained. • A preferred frontier for each variable is identified that maximizes the number of correct counts for the variable.

  23. Concluding Remarks • There are three phases in searching the space of hierarchical clusterings: • Inexpensive generation of an initial clustering • Iterative optimization for clusterings • Retrospective simplification of generated clusterings • Experiments found that the new method, hierarchical redistribution optimization works well

  24. Thanks! Question?

  25. Final Exam Questions • The main idea in this paper is to construct clusterings which satisfy two conditions. • Name the conditions (p.5) • name the two steps to satisfy the conditions • Discribe the three iterative methods for clustering optimization (p.12-20) • The cluster is better when the relative CU score is a) big, b) small, c) equal to 0 (p.7) • Which sorting method is better? a) random sorting, b) similarity sorting (p.14)

More Related