130 likes | 138 Views
This study introduces a novel divide-and-merge technique combining top-down and bottom-up clustering methods to generate hierarchies and flat clusters effectively. The authors propose a spectral algorithm for the divide phase and dynamic programming for the merge phase. Experimental evaluations show promising results on real-world data.
E N D
A Divide-and-Merge Methodology for Clustering Advisor : Dr. Hsu Presenter : Hsin-Yi Huang Authors : David Cheng, Ravi Kavnnan, Santosh Vempala and Grant Wang 2007.TODS.27
Outline • Motivation • Objective • Methodology • Divide Phase • Merge Phase • Application • Experiment • Conclusion • Comments
Motivation • Previous algorithms use either top-down or bottom-up methods to construct a hierarchical clustering. • Others produce a flat clustering using local search (e.g., k-mean).
Objective • The authors present a divide-and-merge methodology that combines top-down and bottom-up techniques to create both a hierarchy and a flat clustering. divide merge
Divide Phase • For the divide phase, the authors suggest an efficient spectral algorithm. divide phase
Merge Phase • The authors are trying to maximize the objective function g, the dynamic program will find a clustering COPT-TREE in the tree. error the number of clusters
Merge Phase (cont.) • K-means • Min-Diameter • Min-Sum • Correlation Clustering
Application (cont.) • The authors implemented the methodology in a meta-search engine, and the web site is located at http://eigencluster.csail.mit.edu • The divide phase: spectral algorithm • The merge phase: relaxed correlation clustering The dissimilarity within a cluster The amount of similarly the clustering fails to capture
Experiment • F-Measure • Entropy • Accuracy • Confusion Matrix The columns are the classed Cj The rows are the clusters
Experiment (cont.) • Divide Phase • Reuters • Merge Phase
Conclusion • The authors present a divide-and-merge methodology for clustering. • An efficient and effective spectral algorithm for the divide phase. • For the merge phase, a dynamic programming formulations that compute the optimal tree-respecting clustering for standard objective functions. • The author propose a thorough experimental evaluation of the methodology shows that technique is effective on real-world data.
Comments • Advantage • an interesting idea • Drawback • … • Application • clustering