250 likes | 363 Views
A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging. Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Cheng-Ru Lin Ming-Syan Chen. Outline. Motivation Objective Introduction Preliminaries Cohesion-Base Self-Merging Algorithm
E N D
A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Cheng-Ru Lin Ming-Syan Chen
Outline • Motivation • Objective • Introduction • Preliminaries • Cohesion-Base Self-Merging Algorithm • Performance Studies • Conclusion • Personal opinion
Motivation • The dissimilarity measured between two clusters are vulnerable to outliers, and removing the outliers precisely is yet another difficult task.
Objective • We propose a new similarity measurement, referred to as “cohesion”, to measure the inter-cluster distances.
Introduction • Hierarchical Clustering algorithms. • Good clustering quality. • Partitional clustering algorithms. • Good execution time and space requirement. • Hybrid clustering algorithms. • combin the features of partitional and hierarchical clustering methods
Preliminaries • Hierarchical Clustering Algorithms. • Hierarchical Clustering Algorithm. • Single-link and Complete-link. • Algorithm CURE.
Preliminaries • Partitional Clustering Algorithms. • The K-means algorithm. • Algorithm CLARA and CLARANS.
Preliminaries • Hybrid Clustering Algorithms. • Phase1:Partition. • Phase2:Merge. • Algorithm BIRCH.
Cohesion-Based Self-Merging Algorithm • We propose a new similarity measurement, namely cohesion, based on the joinability of a data point to another cluster.
Cohesion-Based Self-Merging Algorithm • Definition 1: • Given a cluster Cl consisting of n data points, p1,p2,…,pn, the radius r of Cl is defined as
Cohesion-Based Self-Merging Algorithm • Definition 2: • Given a data point of a cluster and another cluster , the joinability of to is defined as
Cohesion-Based Self-Merging Algorithm • Definition 3: • The cohesion of two clusters and is defined as
Cohesion-Based Self-Merging Algorithm • Algorithm CSM • Input: • The input data set, n. • The number of subclusters, m. • The desired number of clusters, k. • Output: • The hierarchical structure of the k clusters.
Performance Studies • Experiment 1:Clustering Quality of Algorithm CSM.
Performance Studies • Experiment 2:Efficiency of Algorithm CSM.
Conclusion • Algorithm CSM is able to not only resist outliers but also lead to similar clustering results as algorithm CURE while incurring a much shorter execution time complexity.
Personal Opinion • This paper has some examples can help us understand. • Cohesion : a good method to resist outliers. • Weakness : the number of subclusters, m? the desired number of clusters, k?