1 / 29

Grigory Yaroslavtsev Indiana University (Bloomington) grigory /blog

Explore the latest developments in hierarchical clustering for vector data, including algorithms, objectives, and implementations. Learn about single-linkage, complete-linkage, and average-linkage clustering methods. Discover the Moseley-Wang and Random Recursive Partitioning objectives for hierarchical clustering. Dive into the Approximations and Conjectures for Average-linkage clustering. Consider the Projected Random Cut Algorithm for optimized clustering results under the Gaussian kernel similarity.

kristaj
Download Presentation

Grigory Yaroslavtsev Indiana University (Bloomington) grigory /blog

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advances in Hierarchical Clustering of Vector Data Joint work with Moses Charikar, VaggosChatziafratis, Rad Niazadeh (Stanford), AISTATS’19 GrigoryYaroslavtsev Indiana University (Bloomington) http://grigory.us/blog

  2. Clustering • Key problem in unsupervised learning • Workshop at TTI-Chicago “Recent Trends in Clustering” • Tentatively, end of August, stay tuned! Machine Learning Algorithms Clustering

  3. Hierarchical Clustering of Clustering Methods • Flat (single) clustering • # clusters K is fixed • K-means, K-median, K-center, etc • # clusters selected by the algorithm • Correlation clustering • … • Hierarchical clustering • Tree over data points, • Can pick any # of clusters • Relationships between clusters

  4. Data we can hierarchically cluster • Graphs (weighted graph G(V,E)) • Recent work with Facebook • [Avdiukhin, Pupyrev, Y. VLDB’19?] • Scales to the Facebook graph • Up to vertices, edges • Vectors • Arbitrary

  5. Why little theoretical progress on HC? Lots of heuristics, (almost) no rigorous objectives • Bottom-up (linkage-based heuristics) • Single-linkage clustering • Average-linkage clustering • Complete-linkage clustering • Centroid linkage • All sorts of other linkage methods • Implementations of HC in: • Mathematica, R, Matlab • SciPy, Scikit-learn, …

  6. Bottom-up linkage-based heuristics • Start with singletons • Iteratively merge two closest clusters

  7. Single-Linkage Clustering • Distance = distance between two closest points

  8. MST: Single-Linkage Clustering • [Zahn’71]clusters: remove longest MST edges • Objective: maximizes minimum cluster distance • Not true if MST is approximate (sum vs. -th edge)! • Scalable algorithms for Single-Linkage Clustering of vectors [Y., Vadapalli ICML’18] Objective: maximize min(a,b,c)

  9. Complete Linkage • Distance = distance between two furthest points

  10. Average-Linkage Clustering

  11. Distance vs. Similarity • Distance • Arbitrary • M satisfies triangle inequality • Vectors:

  12. Distance vs. Similarity Similarity • Arbitrary: symmetric, • Metric case: monotone with distance • Vector case: • Threshold: • Gaussian kernel: { 1, if 0, if

  13. Dasgupta’s objective for HC [STOC’16] Given n data points and similarity measure • Build a tree T with data points as leaves • For a pair of points :

  14. Dasgupta’s objective for HC [STOC’2016] Given n data points and similarity measure • Build a tree T with data points as leaves • For a pair of points : Minimize: • of in T • = # leaves under LCA(u,v)

  15. Dasgupta’s objective for HC [STOC’2016] Given n data points and similarity measure • Build a tree T with data points as leaves: Minimize: = • = # leaves under LCA(u,v) • Currently best known approximation [Charikar, Chatziafratis, SODA’17; Roy, Pokutta NIPS’16] • LP/SDP-based algorithms, don’t scale to large datasets • As hard as Sparsest Cut (no constant-factor approx. under UGC)

  16. Moseley-Wang objective for HC [NIPS’17] Given n data points and similarity measure • Build a tree T with data points as leaves: Maximize: • = • = # leaves under LCA(u,v)

  17. Moseley-Wang objective for HC [NIPS’17] Given n data points and similarity measure • Build a tree T with data points as leaves: Maximize: • = • = # leaves under LCA(u,v) • Average-linkage gives 1/3-approximation [Moseley, Wang, NIPS’17] • Random recursive partitioning also gives 1/3 (in expectation) • Best known approximation is 1/3 + [Charikar, Chatziafratis,Niazadeh SODA’19] • Uses SDP, doesn’t scale to large data • Average-linkage can’t beat 1/3

  18. Random recursive partitioning [MW’17] Algorithm: Split points randomly and recurse Analysis: Decompose the objective over triples • Max-Upper

  19. Random recursive partitioning [MW’17] Algorithm: Split points randomly and recurse Analysis: Decompose the objective over triples Max-Upper =

  20. Can we do better for vector data? [Charikar,Chatziafratis,Niazaheh,Y. AISTATS’19] () Algorithm: Random-Cut () • Pick uniformly at random in • Recursively cluster points in and Analysis: Gives ½-approximation

  21. Can we do better for vector data? [Charikar,Chatziafratis,Niazaheh,Y. AISTATS’19] () Average-linkage also gives Conjectures: • Average-linkage gives ¾? • Dynamic programming gives the optimum solution?

  22. Can we do better for vector data? • In general, not easier than the general case • General instances can be embedded into high-dimensional vectors • Average-linkage can’t beat 1/3 even for vectors

  23. Projected Random Cut Algorithm Algorithm: • Pick random Gaussian • Compute projections • Run Random Cut on ()

  24. Projected Random Cut Algorithm • Theorem: Projected Random Cut gives approximation under the Gaussian kernel similaritywhere • Key lemma: probability of not scoring an edge of a triangle is proportional to the opposite angle

  25. Key lemma

  26. Projected Random Cut gives If is largest then we score it with prob.

  27. Experimental results for PRC

  28. Scaling it up • Ran PRC on largest vector datasets from UCI ML repository (SIFT 10M, HIGGS) • Approx. points in up to 128 dimensions • running time • Can run on much larger data too • Can scale Single-Linkage clustering too, but • Uses PCA to reduce dimension • Requires a large Apache Spark cluster

  29. Thank you! • Questions? • Some other topics I work(ed) on • Other clustering methods (e.g. correlation clustering) • Massively parallel, streaming, sublinear algorithms • Data compression methods for data analysis • Submodular optimization

More Related