420 likes | 612 Views
KI2 - 7. Clustering Algorithms. Johan Everts. Kunstmatige Intelligentie / RuG. What is Clustering? .
E N D
KI2 - 7 Clustering Algorithms Johan Everts Kunstmatige Intelligentie / RuG
What is Clustering? Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)
The Goals of Clustering • Determine the intrinsic grouping in a set of unlabeled data. • What constitutes a good clustering? • All clustering algorithms will produce clusters, regardless of whether the data contains them • There is no golden standard, depends on goal: • data reduction • “natural clusters” • “useful” clusters • outlier detection
Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.
Agglomerative Clustering Single link In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.
Agglomerative Clustering Complete link In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.
K-Means • Step 0: Start with a random partition into K clusters • Step 1: Generate a new partition by assigning each pattern to its closest cluster center • Step 2: Compute new cluster centers as the centroids of the clusters. • Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)
Locating the ‘knee’ The knee of a curve is defined as the point of maximum curvature.
Leader - Follower • Online • Specify threshold distance • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster
Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster
Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster and update cluster center Distance < Threshold
Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster and update cluster center
Leader - Follower • Find the closest cluster center • Distance above threshold ? Create new cluster • Or else, add instance to cluster and update cluster center Distance > Threshold
Kohonen SOM’s The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing
Kohonen SOM’s • Each weight is representative of a certain input. • Input patterns are shown to all neurons simultaneously. • Competitive learning: the neuron with the largest response is chosen.
Kohonen SOM’s • Initialize weights • Repeat until convergence • Select next input pattern • Find Best Matching Unit • Update weights of winner and neighbours • Decrease learning rate & neighbourhood size Learning rate & neighbourhood size
Kohonen SOM’s Distance related learning
Kohonen SOM’s • Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map
Performance Analysis • K-Means • Depends a lot on a priori knowledge (K) • Very Stable • Leader Follower • Depends a lot on a priori knowledge (Threshold) • Faster but unstable
Performance Analysis • Self Organizing Map • Stability and Convergence Assured • Principle of self-ordering • Slow and many iterations needed for convergence • Computationally intensive
Conclusion • No Free Lunch theorema • Any elevated performance over one class, is exactly paid for in performance over another class • Ensemble clustering ? • Use SOM and Basic Leader Follower to identify clusters and then use k-mean clustering to refine.