260 likes | 547 Views
spectral clustering between friends. One of these things is not like the other…. spectral clustering (a la Ng-Jordan-Weiss). data. similarity graph. edges have weights w ( i , j ). e.g. the Laplacian. diagonal matrix D. Normalized Laplacian :. energy. Normalized Laplacian :.
E N D
spectral clustering (a la Ng-Jordan-Weiss) data similarity graph edges have weights w(i,j) e.g.
the Laplacian diagonal matrix D Normalized Laplacian:
energy Normalized Laplacian:
Normalized Laplacian: Compute first k eigenvectors: v1, v2 , …, vk spectral embedding
clustering Run k–means to cluster the points
it’s amazing! it’s mediocre! spectral clustering … what to prove? it’s antiquated Sidi, et. al. 2011 [TelAviv-SFU] Many, many variants… Many opinions
spectral embedding why should spectral clustering work? k perfect clusters
S Expansion: For a subset SµV, define graph expansion E(S) = set of edges with one endpoint in S.
S1 Expansion: For a subset SµV, define S4 graph expansion S3 E(S) = set of edges with one endpoint in S. S2 k-way expansion constant: Theorem [Cheeger70, Alon-Milman85, Sinclair-Jerrum89]: “most important result in spectral graph theory” -- Wikipedia
S1 Higher-order Cheeger Conjecture [Miclo 08]: S4 For every graph G and k2N, we have Miclo’s conjecture S3 S2 for some C(k) depending only on k. [Lee-OveisGharan-Trevisan 2012]: True with This bound for C(k) is tight. Algorithm of Ng-Jordan-Weiss works, changing the last step.
we do random projection the clustering step random space partition Run k–means to cluster the points
S1 Higher-order Cheeger Conjecture [Miclo 08]: S4 For every graph G and k2N, we have Miclo’s conjecture S3 S2 for some C(k) depending only on k. [Lee-OveisGharan-Trevisan 2012]: True with This bound for C(k) is tight. Algorithm of Ng-Jordan-Weiss works, changing the last step.
Suppose the data has some nice low-dimensional structure hybrid algorithms Spectral embedding could lose that information: Back in a high-dimensional space
Suppose the data has some nice low-dimensional structure hybrid algorithms Use spectral embedding distances to deform the data Do clustering on transformed data set
Consider linear equations in two variables, modulo a prime p Variables: x1, x2, …, xn the unique games conjecture x12+x2=4 x4–3x7 =1 x9+8x12 =9 … If there exists a solution that satisfies 99% of the equations, can you find one that satisfies 10%? Conjectured to be NP-hard [Khot 2002]
Construct a graph with one vertex for every variable, and an edge whenever two variables occur in the same constraint. a spectral attack x12+x2=4 x4–3x7 =1 x9+8x12 =9 … A “good” solution to the equations implies a partition of the graph into p nice clusters!
S1 Higher-order Cheeger Theorem: S4 For every graph G and k2N, we have a spectral attack S3 S2 Unnecessary for large k: [Arora-Barak-Steurer 2010] A better asymptotic dependence would disprove the UGC.