240 likes | 432 Views
A modified version of the K-means algorithm with a distance based on cluster symmetry. Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Mu-Chun Su and Chien-Hsing Chou. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001. Outline. Motivation Objective Introduction
E N D
A modified version of the K-means algorithm with a distance based on cluster symmetry Advisor :Dr. Hsu Reporter:Chun Kai Chen Author:Mu-Chun Su and Chien-Hsing Chou IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2001
Outline • Motivation • Objective • Introduction • The Point Symmetry Distance • Experimental Results • Conclusions • Personal Opinion
Motivation • Since clusters can be of arbitrary shapes and sizes, the Minkowski metrics seem not a good choice for situations where no a priori information about the geometric characteristics of the data set to be clustered exists
Objective • Therefore, we have to find another more flexible measure • One of the basic features of shapes and objects is symmetry • Propose a nonmetric measure based on the concept of point symmetry
10 9 8 7 6 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign Update the cluster means K-means Partitional Clustering
10 9 8 7 6 5 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 reassign reassign reassign Symmetry-based version of the K-means algorithm Update the cluster means Fine-Tuning Coarse-Tuning
Introduction(1/4) • Most of the conventional clustering methods assume that patterns having similar locations or constant density create a single cluster • Location or density becomes a characteristic property of a cluster
Introduction(2/4) • Mathematically identify clusters in a data set • usually necessary to first define a measure of similarity or proximity which will establish a rule for assigning patterns to the domain of a particular cluster center • the most popular similarity measure • the Euclidean distance
Introduction(3/4) • Euclidean distance as a measure of similarity • hyperspherical-shaped clusters of equal size are usually detected • Mahalanobis distance • take care of hyperellipsoidal-shaped clusters, is one of the popular choices
Introduction(4/4) • The major difficulties using the Mahalanobis distance • have to recompute the inverse of the sample covariance matrix every time a pattern changes its cluster domain, which is computationally expensive • In fact, not only similarity measures, but also the number of clusters which cannot always be defined a priori will influence the clustering results • In this paper • we focus on the selection of similarity measures
Symmetry • Symmetry is so common in the abstract and in nature • reasonable to assume some kinds of symmetry exit in the structures of clusters • immediate problem is how to find a metric to measure symmetry
The Point Symmetry Distance • The point symmetry distance is defined as follows: Given N patterns, xi; i=1,…,N, and a reference vector c (e.g., a cluster centroid) • the denominator term is used to normalize • If the right hand term of (2) is minimized when xi = xj*, then the pattern xj* is denoted as the symmetrical pattern relative to xj with respect to c
Symmetry-based version of the K-means algorithm(1/3) • Step 1: Initialization • randomly choose K data points from the data set to initialize K cluster centroids, c1, c2 . . . ; cK. • Step 2: Coarse-Tuning • use the ordinary K-means algorithm with the Euclidean distance to update the K cluster centroids • after the K cluster centroids converge or some kind of terminating criteria is satisfied
Symmetry-based version of the K-means algorithm(2/3) • Step 3: Fine-Tuning • For pattern x, find the cluster centroid nearest it in the symmetrical sense • If the point symmetry distance is smaller than a prespecified parameter θ, then assign the data point x to the k*th cluster • ds(x,ck) is the point symmetry distance • Otherwise, the data point is assigned to the cluster centroid k using the following criterion: • d(x,ck) is the Euclidean distance
Symmetry-based version of the K-means algorithm(3/3) • Step 4: Updating • Compute the new centroids of the K clusters • where Sk(t) is the set whose elements are the patterns assigned to the kth cluster at time t and Nk is the number of elements in Sk. • Step 5: Continuation • If no patterns change categories or the number of iterations has reached a prespecified maximum number, then stop. Otherwise, go to Step 3.
Experimental Results • Used four examples to compare the SBKM algorithm and the SBCL algorithm • In addition, we use one example to show how to use the point symmetry distance in face detections
Mixture of Spherical and Ellipsoidal clusters ordinary K-means SBCL SBKM
Ring-shaped clusters ordinary K-means SBCL SBKM
Linear structures ordinary K-means SBCL SBKM
Combination of ring-shaped, compact,and linear clusters ordinary K-means SBKM SBCL
Conclusion • Both use the point symmetry distance as the dissimilarity measure, the SBKM algorithm outperformed the SBCL algorithm in many cases • The proposed SBKM algorithm can be used to group a given data set into a set of clusters of different geometrical structures • Besides, we can also apply the point symmetry distance to detect human faces. The experimental results are encouraging
Personal Opinion • Advantage • Idea, innovate • Application • clustering • Future Work • Adopt symmetry distance on SOM