450 likes | 486 Views
Explore low-rank weighted correlation clustering and its implications on graph clustering. Discover polynomial-time solutions and NP-hardness results for rank-d matrices. Learn about heuristic algorithms and the connections to specialized algorithms.
E N D
My relationship with correlation clustering started in 2016 • From June-July 2016 I visited Melbourne as part of the East Asia and Pacific Summer Institute Fellowship. • Tony and I studied weighted correlation clustering with low-rank advice • Project was based on an observation Tony and my advisor David Gleich made in 2015 about rank-1 correlation clustering -6 +6 -4 +2 -2 Nate Veldt
Many algorithms focus on complete unweighted correlation clustering + • Given a signed graph G • Each edge indicates similarity (+) or dissimilarity (—) − − − + + − + − − Nate Veldt
In general, edges can be weighted Weights can be stored in an adjacency matrix -6 -4 -3 +2 +6 -2 Nate Veldt
The rank-1 positive semidefinitecase is very simple -6 -4 -3 +2 +6 -2 Nate Veldt
The rank-1 positive semidefinitecase is very simple -6 +3 -2 -4 -3 +2 +6 +2 -1 -2 Nate Veldt
The rank-1 positive semidefinitecase is very simple +3 -2 +2 -1 Nate Veldt
The rank-1 positive semidefinitecase is very simple +3 -2 +2 -1 Ordering v gives a perfect clustering Nate Veldt
The rank-1 positive semidefinitecase is very simple -1 -2 +3 +2 Ordering v gives a perfect clustering Nate Veldt
The rank-1 positive semidefinitecase is very simple -1 -2 +3 +2 Ordering v gives a perfect clustering Nate Veldt
A simple solution for rank-1 positive semidefinite correlation clustering always exists. What happens for other low-rank matrices? Our contributions Polynomial-time solution for rank-d PSD matrices NP-hardness result for negative eigenvalues Heuristic algorithm for PSD matrices Nate Veldt
Rank-d PSD correlation clustering is equivalent to clustering vectors in Rd Nate Veldt
Main observations Nate Veldt
Main observations Cluster Ci Nate Veldt
Main observations Si “Sum point” Cluster Ci Nate Veldt
Main observations Si “Sum point” Cluster Ci For a fixed clustering, the objective can be written in terms of sum points Si: Nate Veldt
Main observations Si “Sum point” Cluster Ci For a fixed clustering, the objective can be written in terms of sum points Si: Also, we can show that the number of clusters can be bounded above by d+1. Nate Veldt
Main observations Si “Sum point” Cluster Ci For a fixed clustering, the objective can be written in terms of sum points Si: d+1 Also, we can show that the number of clusters can be bounded above by d+1. Nate Veldt
Why is the number of clusters bounded? If the clustering is optimal, the sum points will have pairwise negative dot products, i.e. If not, this would indicate that clusters i and j on the whole are “similar”, and merging them would improve the objective. Fact:The maximum number of vectors in Rd with pairwise negative dot products is d+1. [Rankin 1947] Nate Veldt
Our problem can be seen as a special case of the vector partition problem The vector partition problem can be solved in polynomial time by visiting the vertices of the d2-dimensional signing zonotope. [Onn & Schulman 2001] This leads to an algorithm for rank-d positive semidefinite CC. Nate Veldt
Our problem can be seen as a special case of the vector partition problem The vector partition problem can be solved in polynomial time by visiting the vertices of the d2-dimensional signing zonotope. [Onn & Schulman 2001] This leads to an algorithm for rank-d positive semidefinite CC. In practice we developed a faster heuristic for sampling vertices of the zonotope. Nate Veldt
Observation: Assuming low-rank edge weights leads to new complexity results, new algorithms, and connections to other problems. • General question: What other special weighted versions of correlation clustering lead to specialized algorithms and new connections? Nate Veldt
A new idea: simple but unequal weights for positive and negative edges Assign weights with respect to a resolution parameter λ∈ (0,1). • No particular connection to low-rank correlation clustering. However, it similarly leads to: • New complexity results and algorithms • Connections to other known partitioning problems Nate Veldt
This is motivated by applications to graph clustering Given G = (V,E), construct signed graph G’ = (V,E+,E– ), an instance of correlation clustering + + Without weights, unweighted correlation clustering is the same as a problem called cluster editing – + + – – + – Nate Veldt
This is motivated by applications to graph clustering Given G = (V,E), construct signed graph G’ = (V,E+,E– ), an instance of correlation clustering Parameter λ controls your interpretation of the existence or absence of an edge in a network. Nate Veldt
LambdaCC generalizes several graph clustering objectives Modularity Normalized Cut Degree- weighted Standard Cluster Deletion Sparsest Cut Correlation Clustering (Cluster Editing) m = |E| Nate Veldt
LambdaCC generalizes several graph clustering objectives Modularity Normalized Cut Degree- weighted Standard Cluster Deletion Sparsest Cut Correlation Clustering (Cluster Editing) m = |E| Let’s take a quick look at these two! Nate Veldt
Sparse and dense clustering objectives Sparsest cut Nate Veldt
Sparse and dense clustering objectives • Sparsest cut Cluster Deletion Minimize number of edges removed to partition graph into cliques Nate Veldt
Consider a restriction to two clusters S S Positive mistakes: (1 – λ) cut(S) Negative mistakes: λ |E–| – λ [ |S| |S| – cut(S) ] Total weight of mistakes = cut(S) – λ |S| |S| + λ |E–| Nate Veldt
Two-cluster LambdaCC can be written Nate Veldt
Two-cluster LambdaCC can be written constant Nate Veldt
Two-cluster LambdaCC can be written constant Note Nate Veldt
Two-cluster LambdaCC can be written constant Note This is a scaled version of sparsest cut! Nate Veldt
The relationship with sparsest cut holds in general The general LambdaCC objective can be written Theorem Minimizing this objective produces clusters with scaled sparsest cut at most λ (if they exist). There exists some λ’ such that minimizing LambdaCC will return the minimum sparsest cut partition. Nate Veldt
For large λ, LambdaCC generalizes cluster deletion cluster deletion correlation clustering with infinite penalties on negative edges We show this is equivalent to LambdaCC for the right choice of λ ≫ (1-λ) Nate Veldt
Algorithms for LambdaCC • Adapting the approach of van Zuylen and Williamson we obtain new algorithms based on LP relaxations: • ThreeLP: 3-approximation for LambdaCC when λ > ½ • TwoLP: 2-approximation for cluster deletion • We also provide scalable heuristic algorithms • Lambda-Louvain: based on Louvain method for modularity • GrowCluster: greedy agglomeration technique Best known approximation for cluster deletion! [A. van Zuylen and D. P. Williamson. Deterministic pivoting algorithms for constrained ranking and clustering problems. Mathematics of Operations Research, 34(3):594–620, 2009.] Nate Veldt
We cluster social networks with various λto understand the correlation between communities and metadata attributes Student/faculty status Graduation year Dorm Cornell University (Facebook100) Nate Veldt
We cluster social networks with various λ to understand the correlation between communities and metadata attributes Probability that two people who share a cluster also share a metadata attribute Student/faculty status Graduation year Dorm Cornell University (Facebook100) Nate Veldt
We cluster social networks with various λ to understand the correlation between communities and metadata attributes Probability that two people who share a cluster also share a metadata attribute Student/faculty status Probability that they share a related fake attribute Graduation year Dorm Cornell University (Facebook100) Nate Veldt
We cluster social networks with various λ to understand the correlation between communities and metadata attributes Probability that two people who share a cluster also share a metadata attribute Student/faculty status Probability that they share a related fake attribute The gap shows that there is a noticeable correlation between each attribute and the clustering Graduation year Dorm Cornell University (Facebook100) Nate Veldt
Swarthmore Yale Nate Veldt
S/F status and graduation year peak early Swarthmore Yale Nate Veldt
S/F status and graduation year peak early Swarthmore Dorm attribute is more correlated with small, dense communities Yale Nate Veldt
Conclusions and other work • We’ve considered several special cases of correlation clustering that come with novel approximation guarantees and are motivated by different applications in data science. • Other work • Solving the LP relaxation of CC (with James Saunderson) • Choosing λ for LambdaCC • Higher-order correlation clustering • Future work • Correlation clustering for record linkage • Practical algorithms for higher-order correlation clustering • New questions about other low-rank objectives Nate Veldt
Thanks! Papers.arXiv: 1611.07305 (at WWW2017), 1712.05825 (at WWW2018) 1809.09493 (ISAAC, to appear) 1809.01678 (submitted) Software. github: nveldt/LamCCnveldt/MetricOptimization With David Gleich (Purdue), Tony Wirth (Melbourne), and James Saunderson (Monash) Nate Veldt