1 / 30

Unsupervised Learning: Clustering

Explore unsupervised learning and clustering algorithms. Learn about K-Means Clustering, Hierarchical Clustering, problems with K-Means, and constrained clustering techniques. Discover advantages of hierarchical clustering.

sensabaugh
Download Presentation

Unsupervised Learning: Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. See http://www.autonlab.org/tutorials/ for a repository of Data Mining tutorials

  2. Unsupervised Learning • Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y. • But, what if we don’t have labels? • No labels = unsupervised learning • Only some points are labeled = semi-supervised learning • Getting labels may be expensive, so we only get a few • Clustering is the unsupervised grouping of data points. It can be used for knowledge discovery

  3. Clustering algorithms • Many clustering algorithms • Clustering typically done using a distance measure defined between instances • Distance defined in the instance feature space • Agglomerative approach works bottom up: • Treat each instance as a cluster • Merge two closest clusters • Repeat until stop condition is met • Top-down approach starts cluster with all instances • Find a cluster to split into two or more smaller clusters • Repeat until stop condition met

  4. Clustering Data

  5. K-Means Clustering • K-Means ( k , data ) • Randomly choose k cluster center locations (centroids) • Loop until convergence • Assign each point to the cluster of the closest centroid • Re-estimate the cluster centroids based on the data assigned to each • Convergence: no point is assigned to a different cluster

  6. K-Means Clustering • K-Means ( k , data ) • Randomly choose k cluster center locations (centroids) • Loop until convergence • Assign each point to the cluster of the closest centroid. • Re-estimate the cluster centroids based on the data assigned to each • Convergence: no point is assigned to a different cluster

  7. K-Means Clustering • K-Means ( k , data ) • Randomly choose k cluster center locations (centroids) • Loop until convergence • Assign each point to the cluster of the closest centroid • Re-estimate the cluster centroids based on the data assigned to each • Convergence: no point is assigned to a different cluster

  8. K-Means Clustering • K-Means ( k , data ) • Randomly choose k cluster center locations (centroids) • Loop until convergence • Assign each point to the cluster of the closest centroid • Re-estimate the cluster centroids based on the data assigned to each • Convergence: no point is assigned to a different cluster

  9. Problems with K-Means • Only works for numeric data (typically reals) • Verysensitive to the initial points • Do many runs of k-Means, each with different initial centroids • Seed the centroids using a better method than random. (e.g., Farthest-first sampling) • Must manually choose k • Learn the optimal k for the clustering. (Note that this requires a performance measure)

  10. Same-cluster constraint (must-link) Different-cluster constraint (cannot-link) Problems with K-Means • How do you tell it which clustering you want? • Constrained clustering techniques

  11. 1-clustering 2-clustering 3-clustering 4-clustering 5-clustering 1 2 3 4 5 Hierarchical Clustering Recursive partitioning/merging of a data set 5 1 4 2 3

  12. Dendogram • Represents all partitionings of data • Can get a K clustering by looking at connected components at any given level • Frequently binary dendograms, but n-arydendogramseasy to obtain with minor changesto algorithms

  13. Advantages of hierarchical clustering • Don’t need to specify the number of clusters • Good for data visualization • See how the data points interact at many levels • Can view the data at multiple levels of granularity • Understand how all points interact • Specifies all of the K clusterings/partitions

  14. animal vertebrate invertebrate fish reptile amphib. mammal worm insect crustacean Hierarchical Clustering • Common in many domains • Biologists and social scientists • Gene expression data • Document/web page organization • DMOZ • Yahoo directories Two main approaches…

  15. Divisive hierarchical clustering • Top-down • Finding the best partitioning of the data is generally exponential in time • Common approach: • Let C be a set of clusters • Initialize C to be the one-clustering of the data • While there exists a cluster c in C • remove c from C • partition c into 2 clusters using a flat clustering algorithm, c1 and c2 • Add to c1 and c2C • Bisecting k-means

  16. Divisive clustering

  17. Divisive clustering start with one cluster

  18. Divisive clustering split using flat clustering, e.g. Kmeans

  19. Divisive clustering

  20. Divisive clustering split using flat clustering

  21. Divisive clustering split using flat clustering

  22. Divisive clustering

  23. Hierarchical Agglomerative Clustering • Let C be a set of clusters • Initialize C to all points/docs as separate clusters • While C contains more than one cluster • find c1 and c2 in C that are closest together • remove c1 and c2 from C • merge c1 and c2 and add resulting cluster to C • History of merging forms a binary tree or hierarchy • Q: How to measure distance between clusters?

  24. Distance between clusters Single-link • Similarity of the most similar (single-link)

  25. Distance between clusters Complete-link • Similarity of the “furthest” points, the least similar Why are these “local” methods used? efficiency

  26. Distance between clusters • Centroid • Clusters whose centroids (centers of gravity) are the most similar

  27. Distance between clusters • Average-link • Average similarity between all pairs of elements

More Related