A modified version of the K-means algorithm with a distance based on cluster symmetry

A modified version of the K-means algorithm with a distance based on cluster symmetry Advisor ：Dr. Hsu Reporter：Chun Kai Chen Author：Mu-Chun Su and Chien-Hsing Chou IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001

Outline • Motivation • Objective • Introduction • The Point Symmetry Distance • Experimental Results • Conclusions • Personal Opinion

Motivation • Since clusters can be of arbitrary shapes and sizes, the Minkowski metrics seem not a good choice for situations where no a priori information about the geometric characteristics of the data set to be clustered exists

Objective • Therefore, we have to find another more flexible measure • One of the basic features of shapes and objects is symmetry • Propose a nonmetric measure based on the concept of point symmetry

10 9 8 7 6 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign Update the cluster means K-means Partitional Clustering

10 9 8 7 6 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign reassign Symmetry-based version of the K-means algorithm Update the cluster means Fine-Tuning Coarse-Tuning

Introduction(1/4) • Most of the conventional clustering methods assume that patterns having similar locations or constant density create a single cluster • Location or density becomes a characteristic property of a cluster

Introduction(2/4) • Mathematically identify clusters in a data set • usually necessary to first define a measure of similarity or proximity which will establish a rule for assigning patterns to the domain of a particular cluster center • the most popular similarity measure • the Euclidean distance

Introduction(3/4) • Euclidean distance as a measure of similarity • hyperspherical-shaped clusters of equal size are usually detected • Mahalanobis distance • take care of hyperellipsoidal-shaped clusters, is one of the popular choices

Introduction(4/4) • The major difficulties using the Mahalanobis distance • have to recompute the inverse of the sample covariance matrix every time a pattern changes its cluster domain, which is computationally expensive • In fact, not only similarity measures, but also the number of clusters which cannot always be defined a priori will influence the clustering results • In this paper • we focus on the selection of similarity measures

Symmetry • Symmetry is so common in the abstract and in nature • reasonable to assume some kinds of symmetry exit in the structures of clusters • immediate problem is how to find a metric to measure symmetry

The Point Symmetry Distance • The point symmetry distance is defined as follows: Given N patterns, xi; i=1,…,N, and a reference vector c (e.g., a cluster centroid) • the denominator term is used to normalize • If the right hand term of (2) is minimized when xi = xj*, then the pattern xj* is denoted as the symmetrical pattern relative to xj with respect to c

Example of The Point Symmetry Distance

Symmetry-based version of the K-means algorithm(1/3) • Step 1: Initialization • randomly choose K data points from the data set to initialize K cluster centroids, c1, c2 . . . ; cK. • Step 2: Coarse-Tuning • use the ordinary K-means algorithm with the Euclidean distance to update the K cluster centroids • after the K cluster centroids converge or some kind of terminating criteria is satisfied

Symmetry-based version of the K-means algorithm(2/3) • Step 3: Fine-Tuning • For pattern x, find the cluster centroid nearest it in the symmetrical sense • If the point symmetry distance is smaller than a prespecified parameter θ, then assign the data point x to the k*th cluster • ds(x,ck) is the point symmetry distance • Otherwise, the data point is assigned to the cluster centroid k using the following criterion: • d(x,ck) is the Euclidean distance

Symmetry-based version of the K-means algorithm(3/3) • Step 4: Updating • Compute the new centroids of the K clusters • where Sk(t) is the set whose elements are the patterns assigned to the kth cluster at time t and Nk is the number of elements in Sk. • Step 5: Continuation • If no patterns change categories or the number of iterations has reached a prespecified maximum number, then stop. Otherwise, go to Step 3.

Experimental Results • Used four examples to compare the SBKM algorithm and the SBCL algorithm • In addition, we use one example to show how to use the point symmetry distance in face detections

Mixture of Spherical and Ellipsoidal clusters ordinary K-means SBCL SBKM

Ring-shaped clusters ordinary K-means SBCL SBKM

Linear structures ordinary K-means SBCL SBKM

Combination of ring-shaped, compact,and linear clusters ordinary K-means SBKM SBCL

Detecting a face in a complex background

Conclusion • Both use the point symmetry distance as the dissimilarity measure, the SBKM algorithm outperformed the SBCL algorithm in many cases • The proposed SBKM algorithm can be used to group a given data set into a set of clusters of different geometrical structures • Besides, we can also apply the point symmetry distance to detect human faces. The experimental results are encouraging

Personal Opinion • Advantage • Idea, innovate • Application • clustering • Future Work • Adopt symmetry distance on SOM

A modified version of the K-means algorithm with a distance based on cluster symmetry

A modified version of the K-means algorithm with a distance based on cluster symmetry

Presentation Transcript

K-means algorithm

OpenFOAM on a GPU-based Heterogeneous Cluster

A network-based quark-cluster algorithm

K-means algorithm

You will be creating a PowerPoint on Symmetry with the focus on Rotational Symmetry

CompAPO : A complete version of the APO Algorithm

Modified global k-means algorithm for minimum sum-of-squares clustering problems

A Modified Backpropagation Learning Algorithm With Added Emotional Coefficients

Fast modified global k-means algorithm for incremental cluster construction

K-means algorithm

The Antnet Routing Algorithm - A Modified Version

A Semantic Match Algorithm for Web Services Based on Improved Semantic Distance

A Genetic Algorithm Approach to K -Means Clustering

Rek-means A k-means Based Clustering Algorithm

K-means algorithm

Steps of K-means algorithm

A new segmentation algorithm for medical volume image based on K-means clustering

K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm | Edureka

A Comparison of ABK Means Algorithm with Traditional Algorithms

K-means Clustering Algorithm with Matlab Source code

CompAPO : A complete version of the APO Algorithm

Categorical K-means Clustering Algorithm