A new data clustering approach-Generalized cellular automata

A new data clustering approach-Generalized cellular automata Presenter : Shao-Wei Cheng Authors : Dianxun Shuai, Yumin Dong, Qing Shuai IS 2007

Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Personal Comments

區域解 Start Motivation • Many clustering methods have the following limitations and shortcomings in enterprise computing. • The run-time increasing rapidly. • Needs of some pairwise computation or pre-processing. • No guarantee for the clustering optimality. • The clustering performance and quality are sensitive to the cluster shape and cluster distribution. • Unable to well suppress the noise affect. • Poor clustering performance for high-dimensional data. • No learning ability • The dynamic change of clustered data objects are usually not allowed during the algorithm execution. 3

Objectives • This paper is devoted to a novel GCA for self-organizing data clustering in enterprise computing and overcame the limitations and shortcomings above. • GCA is a Generalized Cellular Automata. • GCA have some components and feature. • Cells • States • Neighborhood • Rule • Parallel computation • Local • Homogeneous

Methodology • Rule • N x N cellular array • sij(t) is the state of the cellcij(t), is denoted by Ø • cij(t): cell • p = 1, f（∆H）, 1- f（∆H） • ∆H = Harmony increment • Γ(t) is a matrix • wij is a weight coefficient • Nij= { ci, j-1 , ci, j+1 , ci-1, j , ci+1, j } 5

Methodology • d( sij(t), si'j'(t) )is the similarity • Ifsij(t)≠Ø and si'j'(t)≠Ø, then 0 ≦d( sij(t), si'j'(t) ) ≦1 • Otherwise, d( sij(t), si'j'(t) ) = -1 6

Methodology

Experiments • Number of clusters: 60. • Data set size: 20,000. • t = number of iterations. t = 0 t = 20 t = 40 t = 60 t = 80 t = 200 8

Experiments • Number of clusters: 25. • Average data objects per cluster: 500. • Data set size: 12,500; • Execution times of the GCAA: 1000. 9

Experiments • PAM, Ex. K-means • CLARANS, Clustering Large Applications based on RANdom Search • CURE, Clustering Using REpresentatives 10

Conclusion • The GCA approach hasshown many advantages over other widely used clustering algorithms in terms of the following: • Faster clustering speed. • The ability to handle and recognize the shape-varying and size-varying clusters. • The robustness to outliers. • The ability to learn. • The suitability for high-dimensional data sets.

Personal Comments • Advantage • A novel data clustering approach. • Drawback • … • Application • Clustering in enterprise computing.

A new data clustering approach-Generalized cellular automata

A new data clustering approach-Generalized cellular automata

Presentation Transcript

Cellular Automata

Cellular Automata

Cellular Automata

Cellular Automata

Cellular Automata

Simulations: Cellular Automata

Cellular Automata

Cellular Automata

Cellular Automata

CELLULAR AUTOMATA

Cellular Automata

Cellular Automata

A Cellular Automata Approach to Population Modeling

Cellular Automata

Cellular Automata Generalized To An Inferential System

Cellular Automata

A Cellular Automata Approach to Population Modeling

Cellular Automata

Cellular Automata

Cellular Automata

A New Data Clustering Approach for Data Mining in Large Databases