BiG -Align: Fast Bipartite Graph Alignment

BiG-Align: Fast Bipartite Graph Alignment Danai KoutraHanghang Tong David Lubensky IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

Can we identify users across social networks? Same or “similar” users? Danai Koutra (CMU)

More applications? chemical compound comparison link prediction & viral marketing protein-protein alignment Optical character recognition IR: synonym extraction wiki translation Structure matching in DB Danai Koutra (CMU)

RoadMap • Problem Definition • What’s different? • BiG-Align • Uni-Align • Conclusions Danai Koutra (CMU)

Problem Definition A B INPUT: A, B groups groups 1 1 0 0 0 01 0 1 0 0 0 1 1 1 0 0 0 1 0 00 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 users u s e r s Danai Koutra (CMU)

Problem Definition A B INPUT: A, B OUTPUT: P and … (permutation matrices) groups groups 1 1 0 0 0 01 0 1 0 0 0 1 1 1 0 0 0 1 0 00 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 users u s e r s P (users) B A Danai Koutra (CMU)

Problem Definition A B INPUT: A, B OUTPUT: P and Q (permutation matrices) s.t. min || PAQ - B|| F2 groups groups constraints / relaxations Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what? 1 1 0 0 0 01 0 1 0 0 0 1 1 1 0 0 0 1 0 00 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 users u s e r s Q (groups) P (users) B B A A users/groups permutation of A permutation of users/groups in A Danai Koutra (CMU)

Problem Definition: constraints A B g g INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B||F2 1 1 0 0 0 … … … 01 0 1 0 1 1 0 0 … … 1 1 0 1 Q (groups) P (users) u u B B A A Danai Koutra (CMU)

Problem Definition: constraints A B g g INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B||F2 CONSTRAINTS: • Pij, Qij = probabilities (not 1-1 mapping) • sparse matrices P and Q (more efficient for large scale graphs) 1 1 0 0 0 … … … 01 0 1 0 1 1 0 0 … … 1 1 0 1 Q (groups) P (users) u u B B A A Danai Koutra (CMU)

What’s different? BiG-Align vs. other approaches • Focus on bipartite graphs Danai Koutra (CMU)

What’s different? BiG-Align vs. other approaches • Focus on bipartite graphs • New optimization problem/constraints Danai Koutra (CMU)

What’s different? BiG-Align vs. other approaches • Focus on bipartite graphs • New optimization problem/constraints The hope is: • the specific graph structure will lead to more accurate graph alignment Danai Koutra (CMU)

Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs Danai Koutra (CMU)

Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs • coupled alignment: • individual & community-level nodes communities Danai Koutra (CMU)

Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs • coupled alignment: • individual & community-level • conversion of uni-partite graph to • bi-partite --> clustering + (2) nodes communities Danai Koutra (CMU)

Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs • coupled alignment: • individual & community-level • conversion of unipartitegraph to • bipartite --> clustering + (2) • general formulation: • match clouds of points (point-feature graph) • tensors (e.g. time-evolving, or other 3rddimension) nodes communities users time email Danai Koutra (CMU)

BiG-Align: algorithm DeTaIlS alternating, projected gradient descent until convergence Danai Koutra (CMU)

BiG-Align: algorithm DeTaIlS Probabilistic Constraint until convergence Danai Koutra (CMU)

BiG-Align: algorithm DeTaIlS Sparsity Constraint until convergence Danai Koutra (CMU)

BiG-Align: algorithm DeTaIlS Sparsity Constraint until convergence min f = min||| PAQ – B||F2 + λΣPij + μΣQij Danai Koutra (CMU)

RoadMap • Problem Definition • What’s different? • BiG-Align • Optimizations • Uni-Align • Conclusions Danai Koutra (CMU)

BiG-Align: Optimizations Details alternating, projected gradient descent until convergence Danai Koutra (CMU)

BiG-Align: Optimizations Details alternating, projected gradient descent alternating, projected gradient descent until convergence Danai Koutra (CMU)

Optimization 1:Structurally equivalent nodes Details • Aggregation to super-nodes Graph A Danai Koutra (CMU)

Optimization 2:Initialization of P and Q Details • Why is the initialization important? … local minima global minimum Danai Koutra (CMU)

Optimization 2:Initialization of P and Q Details • Social networks are structured: the degreedistributionis power-lawlike. log(degree) ranked nodes Danai Koutra (CMU)

Optimization 2:Initialization of P and Q Details • 1-1 matching of clusters ofdegrees • 1-1 matching of top knodes • Network-inspired initialization knee k user degrees in GB … cluster 1 k cluster 2 cluster n 2000 1500 1000 945 940 800 799 750 740 735 730 … … … 3 2 1 … degree user degrees of GA … 1000 800 500 450 449 445 … 1 … k … P … . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . … cluster 1 rank of node cluster 2 cluster n Danai Koutra (CMU)

Optimization 3:Steps of gradient descent Details • Constant step: thrashing or slow convergence Danai Koutra (CMU)

Optimization 3:Steps of gradient descent Details • Variable step with line search: strategy for local optimum ηP = argminf(ηP) = g1(P,Q,A,B) ηQ = argminf(ηQ) = g2(P,Q,A,B) closed formulas Danai Koutra (CMU)

Optimization 3:Steps of gradient descent Details • Variable step with line search: strategy for local optimum • BiG-Align-Exact: computes the steps at every iteration ηP = argminf(ηP) = g1(P,Q,A,B) ηQ = argminf(ηQ) = g2(P,Q,A,B) closed formulas Danai Koutra (CMU)

Optimization 3:Steps of gradient descent Details • But Slow change in the steps 5. 10-4 step size (η) 3. 10-4 10-4 104 2.104 3.104 iterations Danai Koutra (CMU)

Optimization 3:Steps of gradient descent Details • But • BiG-Align-Skip: compute η’s every m (=500) iterations Slow change in the steps 5. 10-4 step size (η) 3. 10-4 10-4 104 2.104 3.104 iterations Danai Koutra (CMU)

RoadMap • Problem Definition • What’s different? • BiG-Align • Experiments • Uni-Align • Conclusions Danai Koutra (CMU)

Experimental Setup • Implementation:Matlab • Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres) • Setup: • random permutations • noiselevel: 0 - 20 % Ground truth Simulate real-world applications Danai Koutra (CMU)

State-of-the-art Background • Umeyama’s algorithm [Umeyama88]: SVD-based • NMF-based approach [Ding+08]: Builds on top of Umeyama’s approach • Net-Align [Bayati+09] Belief Propagation Danai Koutra (CMU)

State-of-the-art Background • Umeyama’s algorithm [Umeyama88]: SVD-based • NMF-based approach [Ding+08]: Builds on top of Umeyama’s approach • Net-Align [Bayati+09] Belief Propagation Bi-partite Uni-partite Danai Koutra (CMU)

Big-Align: Accuracy vs. Runtime BiG-Align exact BiG-Align skip NMF-based Umeyama NetAlign marker size related to graph size Danai Koutra (CMU)

Big-Align: Accuracy vs. Runtime BiG-Align exact BiG-Align skip NMF-based Umeyama NetAlign Big-Align improves both speed and accuracy. Danai Koutra (CMU)

Big-Align: Accuracy w.r.t. noise BiG-Align-skip BiG-Align-exact NMF-based NetAlign-deg NetAlign-full Umeyama Danai Koutra (CMU)

Big-Align: Accuracy w.r.t. noise BiG-Align-skip BiG-Align-exact NMF-based NetAlign-deg NetAlign-full Umeyama BiG-Align improves the accuracy for almost all levels of noise. Danai Koutra (CMU)

Algorithm: Uni-Align Details n nodes min ||PAQ - B||F2 P fixed d features • node degree • clustering coeff • … … Danai Koutra (CMU)

Algorithm: Uni-Align Details n nodes min ||PAQ - B||F2 P SVD A = USVT d features P = g*(A,B,S,U)= = closed-form solution O(n.d2) Danai Koutra (CMU)

RoadMap • Problem Definition • What’s different? • BiG-Align • Uni-Align • Experiments • Conclusions Danai Koutra (CMU)

Uni-Align egonet • Dataset: Facebook friendship graph (64K users) • Setup:uni-partite  bi-partite graph • Feature extraction • node degree • egonet degree • edges in egonet • mean degree of node’s neighbors Danai Koutra (CMU)

Uni-Align: Accuracy vs. Runtime Uni-Align NMF-based NetAlign Umeyama Uni-Align, followed by Net-Align, is more accurate and faster than other approaches. Danai Koutra (CMU)

BiG -Align: Fast Bipartite Graph Alignment

BiG -Align: Fast Bipartite Graph Alignment

Presentation Transcript

Weighted Bipartite Matching

Big Learning with Graph Computation

Big Data. Fast Data.

Fast Jensen-Shannon Graph Kernel

Fast computers, big/fast storage, fast networks

Bipartite Matching

Fast Food - Big Money

Fast Sequence Search Multiple Sequence Alignment

Class 4: Fast Sequence Alignment

BiGraph : Bipartite-oriented Distributed Graph Partitioning for Big Learning

2 k -Cycle Free Bipartite Graph

Bipartite Graphs

The Fast of Alignment

Bipartite Graph

Dream Big, Scale Fast

Bipartite Matching

Big, Fast Routers

Fast Local Alignment Methods

A Fast Algorithm for Enumerating Bipartite Perfect Matchings

Big (graph) data analytics

Big (graph) data analytics

Bipartite Matching