400 likes | 646 Views
Nanjing University of Science & Technology. Pattern Recognition: Statistical and Neural. Lonnie C. Ludeman Lecture 27 Nov 9, 2005. Lecture 27 Topics. K-Means Clustering Algorithm Details K-Means Step by Step Example ISODATA Algorithm -Overview
E N D
Nanjing University of Science & Technology Pattern Recognition:Statistical and Neural Lonnie C. Ludeman Lecture 27 Nov 9, 2005
Lecture 27 Topics • K-Means Clustering Algorithm Details • K-Means Step by Step Example • ISODATA Algorithm -Overview • 4. Agglomerative Hierarchical Clustering Algorithm Description
K-Means Clustering Algorithm: Basic Procedure Randomly Select K cluster centers from Pattern Space Distribute set of patterns to the cluster center using minimum distance Compute new Cluster centers for each cluster Continue this process until the cluster centers do not change.
Step 1 Initialization Choose K initial Cluster centers M1(1), M2(1), ... , MK(1) Method 1 – First K samples Method 2 – K data samples selected randomly Method 3 – K random vectors Set m = 1 and Go To Step 2
Step 2 Determine New Clusters Using Cluster centers Distribute pattern vectors using minimum distance. Method 1 – Use Euclidean distance Method 2 – Use other distance measures Assign sample xjto class Ck if Go to Step 3
Step 3 Compute New Cluster Centers Using the new Cluster assignment Clk(m) m = 1, 2, ... , K Compute new cluster centers Mk(m+1) m = 1, 2, ... , K using where Nk, k = 1, 2, ... , K is the number of pattern vectors in Clk(m) Go to Step 4
Step 4 Check for Convergence Using Cluster centers from step 3 check for convergence Convergence occurs if the means do not change If Convergence occurs Clustering is complete and the results given. If No Convergence then Go to Step 5
Step 5 Check for Maximum Number of Iterations Define MAXIT as the maximum number of iterations that is acceptable. If m = MAXIT Then display no convergence and Stop. If m < MAXITThen m=m+1 (increment m) and Return to Step 2
Example:K-Means cluster algorithm Given the following set of pattern vectors
(a) Solution – 2-class case Initial Cluster centers Plot of Data points in Given set of samples
Initial Cluster Centers Distances from all Samples to cluster centers Cl2 Cl1 Cl2 Cl1 Cl2 Cl2 Cl2 With tie select randomly First Cluster assignment
Closest to x2 Closest to x1 Plot of Data points in Given set of samples
First Cluster Assignment Compute New Cluster centers
New Cluster centers Plot of Data points in Given set of samples
Distances from all Samples to cluster centers 2 2 Cl2 Cl2 Cl1 Cl1 Cl2 Cl2 Cl1 Second Cluster assignment
Old Cluster Center M2(2) New Clusters M1(2) Old Cluster Center Plot of Data points in Given set of samples
ClusterCenters M2(3) New Clusters M1(3) Plot of Data points in Given set of samples
Distances from all Samples to cluster centers 3 3 Cl1 Cl1 Cl1 Cl2 Cl2 Cl2 Cl2 Compute New Cluster centers
(b) Solution: 3-Class case Select Initial Cluster Centers First Cluster assignment using distances from pattern vectors to initial cluster centers
Compute New Cluster centers Second Cluster assignment using distances from pattern vectors to cluster centers
At the next step we have convergence as the cluster centers do not change thus the Final Cluster Assignment becomes
Final 3-Class Clusters Cl3 Cl2 Final Cluster Centers Cl1 Plot of Data points in Given set of samples
Iterative Self Organizing Data Analysis Technique A ISODATA Algorithm Performs Clustering of unclassified quantitative data with an unknown number of clusters Similar to K-Means but with ablity to merge and split clusters thus giving flexibility in number of clusters
ISODATA Parameters that need to be specified merged at each step Requires more specified information than for the K-Means Algorithm
ISODATA Algorithm Final Clustering
Hierarchical Clustering Approach 1 Agglomerative Combines groups at each level Approach 2 Devisive Combines groups at each level Will present only Agglomerative Hierarchical Clustering as it is most used.
Agglomerative Hierarchical Clustering Consider a set S of patterns to be clustered S = { x1, x2, ... , xk, ... , xN} Define Level N by S1(N)= { x1} Clusters at level N are the individual pattern vectors S2(N)= { x2} ... SN(N)= { xN}
Define Level N -1 to be N – 1 Clusters formed by merging two of the Level N clusters by the following process. Compute the distances between all the clusters at level N and merge the two with the smallest distance (resolve ties randomly) to give the Level N-1 clusters as S1(N-1) Clusters at level N -1 result from this merging S2(N-1) ... SN-1(N-1)
The process of merging two clusters at each step is performed sequentially until Level 1 is reached. Level one is a single cluster containing all samples S1(1)= { x1, x2, ... , xk, ... , xN} Thus Hierarchical clustering provides cluster assignments for all numbers of clusters from N to 1.
Definition: A Dendrogram is a tree like structure that illustrates the mergings of clusters at each step of the Hierarchical Approach. A typical dendrogram appears on the next slide
Summary Lecture 27 • Presented the K-Means Clustering Algorithm Details • Showed Example of Clustering using the K-Means Algorithm (Step by Step) • Briefly discussed the ISODATA Algorithm • 4. Introduced the Agglomerative Hierarchical Clustering Algorithm