Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu Tommy W.S. Chow

Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu Tommy W.S. Chow Department of Information Management Pattern Recognition Volume: 37, Issue: 2, February, 2004, pp. 175-188.

Outline • Motivation • Objective • Introduction • SOM and Clustering • Clustering of the SOM using local clustering validity index and preprocessing of the SOM for filtering • Experimental results • Conclusions • Personal opinion • Review

Motivation • Classical clustering methods based on the SOM.

Objective • Preprocessing techniques • Filtering out noises and outliers. • A new two-level SOM-based clustering algorithm. • Clustering validity index based on inter-cluster and intra-cluster density.

Introduction • Self-Organizing Map, SOM. • Clustering algorithms. • two-level SOM-based clustering. • In this paper, a new two-level algorithm for clustering of the SOM is proposed. • SOM. • Agglomerative hierarchical clustering.

SOM and Clustering • SOM and visualization. • Clustering algorithms. • Clustering of the SOM.

SOM and visualization • Initial Step. • Training Step. • Find the winner from (1). • Update the winner and neighborhood according to (2).

SOM and visualization

Clustering algorithms • The categories of clustering methods • Hierarchical • Partitioning • Density-based • Grid-based • Model-based

Clustering of the SOM • Agglomerative hierarchical clustering of the SOM. • Merging criterion : Inter-cluster distance, Inter-cluster and intra-cluster density. • Filtering noises and outliers before clustering of the SOM.

Clustering of the SOM using local clustering validity index and preprocessing of the SOM for filtering • Global clustering validity index for different clustering algorithms. • Merging criterion using the CDbw. • Preprocessing before clustering of the SOM. • Clustering of the SOM. • The algorithm of clustering of the SOM.

Global Clustering validity index for different clustering algorithms • Three types of methods used to cluster validity: • External criteria. • Internal criteria. • Relative criteria. compact and well-separated clusters • The newly proposed multi-representation clustering validity index.

CDbw • The notations in the clustering validity index • A set of representation points represents the i th cluster. • stdev(i) is a standard deviation vector of the i th cluster. • The p th component of stdev(i) is defined by • The average standard deviation is given by

CDbw – Intra_den & Inter_den

CDbw • The definition of the clusters’ separation • The overall clustering validity index, which is called “Composing Density Between and With clusters”.

Merging criterion using the CDbw • To find the pair of clusters with minimal value of the CDbw.

Preprocessing before clustering of the SOM Labeling. Compute the distance deviation : devj=||wj - mj||, mean_dev, and std_dev. If devj > mean_dev + std_dev, exclude the neuron j. Compute distances : disj(xi)=||xi - wj||, mean_disj, and std_disj. If disj(xi) > mean_disj + std_devj, filter out the input vector xj. Compute the number of data belonging to the jth cluster : numj, mean_num, and std_num. If numj < mean_num - std_num, exclude the neuron j.

Clustering of the SOM

The algorithm of Clustering of the SOM Train input data by the SOM. Preprocessing before clustering of the SOM. Cluster SOM by using the agglomerative hierarchical clustering. The merging criterion is the CDbw. Find the optimal partition of the input data according to the CDbw.

Experimental results • 200 2D synthetic data set. • With some noises and outliers. • Use k-means, four HCA, and the proposed algorithm. • 150 Iris data set. • Three classes with 50 points each. • Use single-linkage and proposed clustering algorithm. • 1780 15D synthetic data set. • Generating 20 uniformly distributed random 15D points. • 178 Wine data set. • Three classes are 59, 71, and 48, respectively.

2D synthetic data set

Iris data set

15D synthetic data set

Wine data set

Conclusions • In this paper, we propose a new SOM-based clustering algorithm. • The clustering validity index locally to determine which pair of clusters to be merged. • The preprocessing steps for filtering out noises and outliers. • The experimental results better than other clustering algorithms on the SOM.

Personal opinion • This method more precise than others. • We can consider the entropy or other index besides distance and density.

Review • Self-Organizing Map, SOM. • Clustering methods. • Two-level Clustering. • Clustering Validity index – CDbw. • The preprocessing steps.

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu Tommy W.S. Chow

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu Tommy W.S. Chow

Presentation Transcript

Author: Qi Wu

Advisor ： Dr. Hsu Graduate ： Ching-Lung Chen Author ： Victoria J. Hodge Jim Austin

Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Heni Ben Amor, Achim Rettinger

Advisor ： Dr. Hsu Graduate ： Ching-Lung Chen Author ： Pabitra Mitra Student Member

Advisor ： Dr. Hsu Graduate ： Chun Kai Chen Author ： Keita Tsuji

Abdullah Almurayh MSCS Graduate Candidate Committee members: Dr. Edward Chow (Advisor)

Advisor ： Dr. Hsu Reporter ： Chun Kai Chen

Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Pu-Jen Cheng

Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department of Information Management

Advisor: Dr. Hsu Reporter: Y.P.Huang

Presenter : Shiu , Jia-Hau Advisor : Wang, Sheng-Jyh

Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department of Information Management

Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Eric Brill Gary Kacmarcik

Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department of Information Management

Advisor ： Dr. Hsu Graduate ： Ching-Lung Chen Author ： Pabitra Mitra Student Member

Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Juan D.Velasquez

Advisor ： Dr.Hsu Graduate ： Keng-Wei Chang Author ： Andrew K. C. Wong Yang Wang

Author ： Jorge C.G., Ramirez et al. Advisor ： Dr. Hsu Graduate ： Min-Hong Lin

Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Tao-Hsing Chang Chia-Hoang Lee

Advisor ： Dr. Hsu Presenter ： Yu Cheng Chen Author: YU-SHENG LAI AND CHUNG-HSIEN WU