200 likes | 412 Views
Data Clustering: 50 years beyond K-means. Presenter : Jiang-Shan Wang Authors : Anil K. Jain. 國立雲林科技大學 National Yunlin University of Science and Technology. PRL 2010. Outline. Motivation Objective Data clustering User’s dilemma K-means Extensions of K-means
E N D
Data Clustering: 50 years beyond K-means Presenter : Jiang-Shan Wang Authors : Anil K. Jain 國立雲林科技大學 National Yunlin University of Science and Technology PRL 2010
Outline • Motivation • Objective • Data clustering • User’s dilemma • K-means • Extensions of K-means • Trends in data clustering • Summary • Comments
Motivation • Providing a brief overview of clustering and point out some of the emerging and useful research directions.
Objective Summarizing well known clustering methods, discuss the major challenge and key issues in designing clustering algorithm, and point out some of the emerging and useful research directions.
Data clustering • Three main purposes: • Underlying structure • Natural classification • Compression
K-means • Three parameters • Number of clusters • Cluster initialization • Distance metrics
Extensions of K-means Fuzzy C-means Bisecting K-means X-means K-medoid Kernel K-means
User’s dilemma Representation
User’s dilemma Purpose of grouping
User’s dilemma Number of clusters
User’s dilemma Cluster validity
User’s dilemma Comparing clustering algorithm
User’s dilemma Comparing clustering algorithm
User’s dilemma • Admissibility analysis of clustering algorithms • Fisher and vanNess’s criteria • Convex • Cluster proportion • Cluster omission • Monotone • Kleinberg’s criteria • Scale invariance • Richness • consistency
Trends in data clustering Clustering ensembles
Trends in data clustering Semi-supervised clustering
Trends in data clustering • Large-scale clustering • Studies • Efficient Nearest Neighbor • Data summarization • Distributed computing • Incremental clustering • Sampling-based methods
Trends in data clustering • Multi-way clustering • Heterogeneous data • Rank data • Dynamic data • Graph data • Relational data
Summary There needs to be a suite of benchmark data. A tighter integration between clustering algorithms and the application needs. Optimization problems. Stability or consistency. Choose clustering principles according to satisfiability of the stated axioms. Develop semi-supervised clustering.
Comments • Advantage • Many figures to understanding. • Drawback • … • Application • Clustering.