120 likes | 230 Views
A new data clustering approach-Generalized cellular automata. Presenter : Shao-Wei Cheng Authors : Dianxun Shuai, Yumin Dong, Qing Shuai. IS 2007. Outline. Motivation Objective Methodology Experiments Conclusion Personal Comments. 區域解. Start. Motivation.
E N D
A new data clustering approach-Generalized cellular automata Presenter : Shao-Wei Cheng Authors : Dianxun Shuai, Yumin Dong, Qing Shuai IS 2007
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Personal Comments
區域解 Start Motivation • Many clustering methods have the following limitations and shortcomings in enterprise computing. • The run-time increasing rapidly. • Needs of some pairwise computation or pre-processing. • No guarantee for the clustering optimality. • The clustering performance and quality are sensitive to the cluster shape and cluster distribution. • Unable to well suppress the noise affect. • Poor clustering performance for high-dimensional data. • No learning ability • The dynamic change of clustered data objects are usually not allowed during the algorithm execution. 3
Objectives • This paper is devoted to a novel GCA for self-organizing data clustering in enterprise computing and overcame the limitations and shortcomings above. • GCA is a Generalized Cellular Automata. • GCA have some components and feature. • Cells • States • Neighborhood • Rule • Parallel computation • Local • Homogeneous
Methodology • Rule • N x N cellular array • sij(t) is the state of the cellcij(t), is denoted by Ø • cij(t): cell • p = 1, f(∆H), 1- f(∆H) • ∆H = Harmony increment • Γ(t) is a matrix • wij is a weight coefficient • Nij= { ci, j-1 , ci, j+1 , ci-1, j , ci+1, j } 5
Methodology • d( sij(t), si'j'(t) )is the similarity • Ifsij(t)≠Ø and si'j'(t)≠Ø, then 0 ≦d( sij(t), si'j'(t) ) ≦1 • Otherwise, d( sij(t), si'j'(t) ) = -1 6
Experiments • Number of clusters: 60. • Data set size: 20,000. • t = number of iterations. t = 0 t = 20 t = 40 t = 60 t = 80 t = 200 8
Experiments • Number of clusters: 25. • Average data objects per cluster: 500. • Data set size: 12,500; • Execution times of the GCAA: 1000. 9
Experiments • PAM, Ex. K-means • CLARANS, Clustering Large Applications based on RANdom Search • CURE, Clustering Using REpresentatives 10
Conclusion • The GCA approach hasshown many advantages over other widely used clustering algorithms in terms of the following: • Faster clustering speed. • The ability to handle and recognize the shape-varying and size-varying clusters. • The robustness to outliers. • The ability to learn. • The suitability for high-dimensional data sets.
Personal Comments • Advantage • A novel data clustering approach. • Drawback • … • Application • Clustering in enterprise computing.