360 likes | 373 Views
Explore the nuances of K-means clustering optimization objective, random initialization, and determining the number of clusters, along with an introduction to hierarchical clustering and soft clustering (Fuzzy C-Means). Learn how to choose the right value of K and evaluate K-means based on various optimization criteria. Dive into hierarchical clustering techniques and understand the concept of soft clustering for more nuanced cluster assignments. Discover practical applications and methods of optimizing cluster analyses in unsupervised learning.
E N D
Machine Learning Clustering • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)
References • Nilsson, N. J. (1996). Introduction to machine learning. An early draft of a proposed textbook. (Chapter 9) • Marsland, S. (2014). Machine learning: an algorithmic perspective. CRC press. (Chapter 9) • Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence (Chapter 15) (Fuzzy C-Means) • …
Supervised learning Training set: => Classification: estimating the separator hyperplane
Unsupervised learning Training set: => Clustering
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Applications of Clustering Giant Component Analysis in net Social network analysis Market segmentation Astronomical data analysis Organize computing clusters
K-means Algorithm K: number of clusters First step: random initializing for cluster centers
K-mean Algorithm Second Step: assigning cluster index to samples
K-mean Algorithm Third Step: moving the cluster centroids to the average of the samples in each cluster
K-mean Algorithm Reassigning samples
K-mean Algorithm Moving the centroid to the average
K-mean Algorithm Reassigning samples
K-mean Algorithm Moving the centroid to the average
K-mean Algorithm Reassigning samples no change!
K-means algorithm • Input: • (number of clusters) • Training set
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Moving average Cluster assignment
Distance Metrics • Euclidian distance (L2 norm): • L1 norm: • Cosine Similarity (colleration) (transform to a distance by subtracting from 1):
K-means for non-separated clusters T-shirt sizing Weight Height
Local optima K=3 K<m
Random initialization to escape the local optima For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get . Compute cost function (distortion) } Pick clustering that gave lowest cost
Optimality of clusters • Optimal clusters should • minimize distance within clusters • maximize distance between clusters • Fisher criteria
Content • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)
Choosing the value of K Sometimes, you’re running K-means to get clusters to use for some later purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g.
K-means optimization objective • = index of cluster (1,2,…, ) to which example is currently assigned • = cluster centroid ( ) • = cluster centroid of cluster to which example has been assigned Optimization objective:
K-means optimization objective Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster }
Content • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)
Hierarchical clustering: example Clustering important cities in Iran for a business purpose
Hierarchical clustering: forming clusters • Forming clusters from dendograms
Hierarchical Clustering • Given the input set S, the goal is to produce a hierarchy (dendrogram) in which nodes represent subsets of S. • Features of the tree obtained: • The root is the whole input set S. • The leaves are the individual elements of S. • The internal nodes are defined as the union of their children. • Each level of the tree represents a partition of the input data into several (nested) clusters or groups.
Hierarchical clustering • Input: a pairwise matrix involved all instances in S • Algorithm • Place each instance of S in its own cluster (singleton), creating the list of clusters L (initially, the leaves of T): L= S1, S2, S3, ..., Sn-1, Sn. • Compute a merging cost function between every pair of elements in L to find the two closest clusters {Si, Sj} which will be the cheapest couple to merge. • Remove Si and Sj from L. • Merge Si and Sj to create a new internal node Sij in T which will be the parent of Si and Sj in the resulting tree. • Go to Step 2 until there is only one set remaining.
Soft Clustering: Fuzzy C-Means • An extension of k-means • Hierarchical k-means generates partitions • each data point can only be assigned in one cluster • Soft clustering gives probabilities that an instance belongs to each of a set of clusters. • Fuzzy c-means allows data points to be assigned into more than one cluster • each data point has a degree of membership (or probability) of belonging to each cluster • Fuzzy C-Means (fcmmatlab command)