A Density-Based Cluster Validity Approach Using Multi-Representatives

A density-based cluster validity approach using multi-representatives Presenter : Lin, Shu-Han Authors : Maria Halkidi *, Michalis Vazirgiannis ˜ Pattern Recognition Letters 29 (2008)

Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Personal Comments

Motivation • Many clustering algorithms under different clustering assumptions, often lead to qualitatively different results. As a consequence the results of clustering algorithms (i.e., data set partitioning) need to be evaluated as regards their validity based on widely accepted criteria. • In this paper they motivate the aspects of assessing the validity of clustering results, using examples: The different partitioning defined by K-Means when it runs with different input parameter (ipvs) ,they just aim to find the best possible partitioning for the given ipvs but there is no indication that the defined clusters are the ones that best fit data. 3 Fig. The different partitioning defined by K-Means when it runs with different ipvs.

Objectives • To define and evaluate a new validity index, CDbw(Composed density between and within clusters) and a methodology that given a data set, S, and a set of algorithms A = {algi} enables • (i) finding the set of input parameter values (i.e., the best partitioning of a data set) that lead each algi to the best possible clustering results. • (ii) taking into account the results of (i), finding algi that returns the best partitioning of S among those defined by the considered algorithms. 4 Fig. Partitioning of DS3 into three clusters as defined by different clustering algorithms. (a) K-Means, (b) CURE and (c) DBSCAN, CLUTO.

Methodology • A cluster validity approach based on density 5 Fig. Inter-cluster density definition.

Methodology (Cont.) • (A) Cluster representative points definition • Closest representative points • Respective closest representative points 6

Methodology (Cont.) • (B) Clusters’ separation in terms of density • Density between clusters • Inter-cluster density • Clusters’ separation (Sep) Stdev: the standard deviation is a measure of the dispersion of a set of values 7

Ci.center s = 0.8 s = 0.7 s = 0.6 s = 0.5 s = 0.4 s = 0.3 s = 0.2 s = 0.1 Vij stdev Methodology (Cont.) • (C) Clusters’ compactness in terms of density • The compactness of a clustering • Relative intra-cluster density s ∈ [0.1, 0.8] (user-defined) 8

Ci.center s = 0.8 s = 0.7 s = 0.6 s = 0.5 s = 0.4 s = 0.3 s = 0.2 s = 0.1 Vij stdev Methodology (Cont.) • (D) Assessing the quality of a data clustering • Clusters’ cohesion • Intra-density changes • Cohesion • Separation wrt compactness • (E) CDbw definition s ∈ [0.1, 0.8] (user-defined) 9

Experiments • (A) Select the partitioning that best fits data among data set Fig. CDbw as a function of number of clusters for DS1 (CLUTO). Fig. Nd_Set CDbw vs the number of clusters for a 120-dimensional data set. 10

Experiments (Cont.) • (B) Select clustering algorithm Table. Best partitioning found by CDbw for different clustering algorithms 11

Experiments (Cont.) Fig. Synthetic data sets: (a) DS1 and partitioning of DS1 using CLUTO, (b) K-Means, and (c) CURE. Fig. Partitioning of DS3 into three clusters as defined by different clustering algorithms. (a) K-Means, (b) CURE and (c) DBSCAN, CLUTO. 12

Experiments (Cont.) Table. Accuracy of the clusterings presented with respect to the expected partitioning of DS2 • (C) Comparison to other cluster validity indices Fig. Partitioning of DS2 into four clusters as defined by (a) K-Means, (b) CURE, (c) the CLUTO algorithm and (d) DBSCAN. Table. Best partitioning proposed by validity indices compared with CDbw* 13

Conclusions • In this paper, they defined a new validity index, CDbw, and a methodology for finding the clustering among those defined by an algorithm or different clustering algorithms that best fits data. • It achieves this by considering multi-representative points per cluster. Contrary to other , their cohesion criterion that estimates density changes within clusters.

Personal Comments • Advantage • Accuracy • Data independent • Algorithm independent • Drawback • … • Application • …

A Density-Based Cluster Validity Approach Using Multi-Representatives

A Density-Based Cluster Validity Approach Using Multi-Representatives

Presentation Transcript

Multi-Robot Coordination Using a Market-based Approach

Cluster Validity

A Clustering Based Approach to Creating Multi-Document Summaries

A Confidence-Based Approach to Multi-Robot Demonstration Learning

The Cluster Approach

A Link-Based Cluster Ensemble Approach for Categorical Data Clustering

Tree-Based Density Clustering using Graphics Processors

Cluster-Based Retrieval Using Language Models

Patient Journey Optimization using a Multi-agent Approach

Patient Journey Optimization using a Multi-agent approach

Using a Multi-Channel approach

A density-based cluster validity approach using multi-representatives

a multi-scale, pattern-based approach to sequential simulation

A Cluster Training Approach

Perioperative Pain Management Using a Multi-Modal Approach

A Multi-dimensional Approach

A Multi-Template Multi-Model Combination Approach to Template-Based Modeling

Cluster Validity

Scatter/Gather : A Cluster Based Approach to Large Document Collections

Fuzzy cluster validity indices

A Multi-Agent-Approach

CLUSTER VALIDITY