1 / 46

CS26110 AI Toolbox

CS26110 AI Toolbox. Clustering 3. Clustering lectures overview. Datasets, data points, dimensionality, distance What is clustering? Partitional clustering k -means algorithm Extensions (fuzzy) Hierarchical clustering Agglomerative/Divisive Single-link, complete-link, average-link.

marlow
Download Presentation

CS26110 AI Toolbox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS26110AI Toolbox Clustering 3

  2. Clustering lectures overview • Datasets, data points, dimensionality, distance • What is clustering? • Partitional clustering • k-means algorithm • Extensions (fuzzy) • Hierarchical clustering • Agglomerative/Divisive • Single-link, complete-link, average-link

  3. Hierarchical clustering algorithms • Agglomerative (bottom-up): • Start with each data point being a single cluster • Merge based on closeness • Eventually all data points belong to the same cluster • Divisive (top-down): • Start with all data points belonging to the same cluster • Split up based on distance • Eventually each node forms a cluster on its own • Does not require the number of clusters k in advance • Needs a termination/readout condition

  4. Hierarchical Agglomerative Clustering • Assumes a similarity function for determining the similarity of two data points • = distance function from before • Starts with all points in separate clusters and then repeatedly joins the two clusters that are most similar until there is only one cluster • The history of merging forms a binary tree or hierarchy

  5. Hierarchical Agglomerative Clustering • Clustering obtained by cutting the dendrogram at a desired level: each connected component forms a cluster

  6. Hierarchical Agglomerative Clustering • Basic algorithm is straightforward • Compute the distance matrix (= distance between any 2 points) • Let each data point be a cluster • Repeat • Merge the two (or more) closest clusters • Update the distance matrix • Until only a single cluster remains • Key operation is the computation of the proximity of two clusters • Different approaches to define the distance between clusters distinguish the different algorithms

  7. Hierarchical clustering • Two important questions: • How do you determine the “nearness” of clusters? • How do you represent a cluster of more than one point?

  8. Example cluster cluster 1 2 4 6 5 3 intercluster distance

  9. Closest pair of clusters Many variants to defining closest pair of clusters • Single-link • Distance of the “closest” points • Complete-link • Distance of the “furthest” points • Centroid • Distance of the centroids (centers of gravity) • Average-link • Average distance between pairs of elements

  10. Examples Single-link Complete-link Average-link

  11. Single-link agglomerative clustering • Use minimum distance of pairs: • Can result in “straggly” (long and thin) clusters due to chaining effect • After merging ci and cj, the similarity of the resulting cluster to another cluster, ck, is:

  12. Exercise • Given the following 1D data: {6, 8, 18, 26, 13, 32, 24}, perform single-link HAC • Compute the distance matrix (= distance between any 2 points) • Let each data point be a cluster • Repeat • Merge the two (or more) closest clusters • Update the distance matrix • Until only a single cluster remains

  13. Distance matrix

  14. Distance matrix

  15. Dendogram 6 8 13 18 24 26 32

  16. Distance matrix: single-link

  17. Distance matrix: single-link

  18. Dendogram 6 8 13 18 24 26 32

  19. Distance matrix: single-link

  20. Distance matrix: single-link

  21. Final dendogram 6 8 13 18 24 26 32

  22. Final dendogram 6 8 13 18 24 26 32

  23. Final clustering: HAC 6 8 13 18 24 26 32

  24. Final clustering: k-means 6 8 13 18 24 26 32

  25. Final dendogram 6 8 13 18 24 26 32

  26. Final clustering: HAC 6 8 13 18 24 26 32

  27. Complete-link agglomerative clustering • Use maximum distance of pairs: • Makes “tighter,” spherical clusters that are typically preferable • After merging ci and cj, the similarity of the resulting cluster to another cluster, ck, is:

  28. Exercise • Given the following 1D data: {6, 8, 18, 26, 13, 32, 24}, perform complete-link HAC • Compute the distance matrix (= distance between any 2 points) • Let each data point be a cluster • Repeat • Merge the two (or more) closest clusters • Update the distance matrix • Until only a single cluster remains

  29. Distance matrix

  30. Distance matrix

  31. Dendogram 6 8 13 18 24 26 32

  32. Distance matrix: complete-link

  33. Distance matrix: complete-link

  34. Dendogram 6 8 13 18 24 26 32

  35. Distance matrix: complete-link

  36. Distance matrix: complete-link

  37. Dendogram 6 8 13 18 24 26 32

  38. Distance matrix: complete-link

  39. Dendogram 6 8 13 18 24 26 32

  40. Final dendogram 6 8 13 18 24 26 32

  41. Final clustering: HAC 6 8 13 18 24 26 32

  42. Final dendogram 6 8 13 18 24 26 32

  43. Final clustering: HAC 6 8 13 18 24 26 32

  44. HAC critique • What do you think are the advantages of HAC over k-means? • k not required at the start • A hierarchy is obtained (which can be quite informative) • Many possible clusterings can be derived • ... • What are the disadvantages? • Where to slice the dendogram? (cluster validity measure might help here though) • Complexity (see next slide) • Which choice of linkage? (average-link very costly)

  45. Time complexity • In the first iteration, all HAC methods need to compute similarity of all pairs of n individual instances which is O(mn2) • In each of the subsequent merging iterations, compute the distance between the most recently created cluster and all other existing clusters • Maintaining of heap of distances allows this to be O(mn2logn)

  46. What to take away • Understand the HAC process and its limitations • Be able to apply HAC to new data

More Related