1 / 57

BiG -Align: Fast Bipartite Graph Alignment

BiG -Align: Fast Bipartite Graph Alignment. Danai Koutra Hanghang Tong David Lubensky. IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA. Can we identify users across social networks?. Same or “similar” users?. More applications?. c hemical compound comparison.

chung
Download Presentation

BiG -Align: Fast Bipartite Graph Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BiG-Align: Fast Bipartite Graph Alignment Danai KoutraHanghang Tong David Lubensky IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

  2. Can we identify users across social networks? Same or “similar” users? Danai Koutra (CMU)

  3. More applications? chemical compound comparison link prediction & viral marketing protein-protein alignment Optical character recognition IR: synonym extraction wiki translation Structure matching in DB Danai Koutra (CMU)

  4. RoadMap • Problem Definition • What’s different? • BiG-Align • Uni-Align • Conclusions Danai Koutra (CMU)

  5. Problem Definition A B INPUT: A, B groups groups 1 1 0 0 0 01 0 1 0 0 0 1 1 1 0 0 0 1 0 00 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 users u s e r s Danai Koutra (CMU)

  6. Problem Definition A B INPUT: A, B OUTPUT: P and … (permutation matrices) groups groups 1 1 0 0 0 01 0 1 0 0 0 1 1 1 0 0 0 1 0 00 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 users u s e r s P (users) B A Danai Koutra (CMU)

  7. Problem Definition A B INPUT: A, B OUTPUT: P and Q (permutation matrices) s.t. min || PAQ - B|| F2 groups groups constraints / relaxations Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what? 1 1 0 0 0 01 0 1 0 0 0 1 1 1 0 0 0 1 0 00 0 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 0 1 0 1 1 0 1 users u s e r s Q (groups) P (users) B B A A users/groups permutation of A permutation of users/groups in A Danai Koutra (CMU)

  8. Problem Definition: constraints A B g g INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B||F2 1 1 0 0 0 … … … 01 0 1 0 1 1 0 0 … … 1 1 0 1 Q (groups) P (users) u u B B A A Danai Koutra (CMU)

  9. Problem Definition: constraints A B g g INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B||F2 CONSTRAINTS: • Pij, Qij = probabilities (not 1-1 mapping) • sparse matrices P and Q (more efficient for large scale graphs) 1 1 0 0 0 … … … 01 0 1 0 1 1 0 0 … … 1 1 0 1 Q (groups) P (users) u u B B A A Danai Koutra (CMU)

  10. RoadMap • Problem Definition • What’s different? • BiG-Align • Uni-Align • Conclusions Danai Koutra (CMU)

  11. What’s different? BiG-Align vs. other approaches • Focus on bipartite graphs Danai Koutra (CMU)

  12. What’s different? BiG-Align vs. other approaches • Focus on bipartite graphs • New optimization problem/constraints Danai Koutra (CMU)

  13. What’s different? BiG-Align vs. other approaches • Focus on bipartite graphs • New optimization problem/constraints The hope is: • the specific graph structure will lead to more accurate graph alignment Danai Koutra (CMU)

  14. Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs Danai Koutra (CMU)

  15. Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs • coupled alignment: • individual & community-level nodes communities Danai Koutra (CMU)

  16. Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs • coupled alignment: • individual & community-level • conversion of uni-partite graph to • bi-partite --> clustering + (2) nodes communities Danai Koutra (CMU)

  17. Why bipartite graphs? • ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groupsuser-movie rating graphs • coupled alignment: • individual & community-level • conversion of unipartitegraph to • bipartite --> clustering + (2) • general formulation: • match clouds of points (point-feature graph) • tensors (e.g. time-evolving, or other 3rddimension) nodes communities users time email Danai Koutra (CMU)

  18. RoadMap • Problem Definition • What’s different? • BiG-Align • Uni-Align • Conclusions Danai Koutra (CMU)

  19. BiG-Align: algorithm DeTaIlS alternating, projected gradient descent until convergence Danai Koutra (CMU)

  20. BiG-Align: algorithm DeTaIlS Probabilistic Constraint until convergence Danai Koutra (CMU)

  21. BiG-Align: algorithm DeTaIlS Sparsity Constraint until convergence Danai Koutra (CMU)

  22. BiG-Align: algorithm DeTaIlS Sparsity Constraint until convergence min f = min||| PAQ – B||F2 + λΣPij + μΣQij Danai Koutra (CMU)

  23. RoadMap • Problem Definition • What’s different? • BiG-Align • Optimizations • Uni-Align • Conclusions Danai Koutra (CMU)

  24. BiG-Align: Optimizations Details alternating, projected gradient descent until convergence Danai Koutra (CMU)

  25. BiG-Align: Optimizations Details alternating, projected gradient descent alternating, projected gradient descent until convergence Danai Koutra (CMU)

  26. Optimization 1:Structurally equivalent nodes Details • Aggregation to super-nodes Graph A Danai Koutra (CMU)

  27. BiG-Align: Optimizations Details alternating, projected gradient descent alternating, projected gradient descent until convergence Danai Koutra (CMU)

  28. Optimization 2:Initialization of P and Q Details • Why is the initialization important? … local minima global minimum Danai Koutra (CMU)

  29. Optimization 2:Initialization of P and Q Details • Social networks are structured: the degreedistributionis power-lawlike. log(degree) ranked nodes Danai Koutra (CMU)

  30. Optimization 2:Initialization of P and Q Details • 1-1 matching of clusters ofdegrees • 1-1 matching of top knodes • Network-inspired initialization knee k user degrees in GB … cluster 1 k cluster 2 cluster n 2000 1500 1000 945 940 800 799 750 740 735 730 … … … 3 2 1 … degree user degrees of GA … 1000 800 500 450 449 445 … 1 … k … P … . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . … cluster 1 rank of node cluster 2 cluster n Danai Koutra (CMU)

  31. BiG-Align: Optimizations Details alternating, projected gradient descent alternating, projected gradient descent until convergence Danai Koutra (CMU)

  32. Optimization 3:Steps of gradient descent Details • Constant step: thrashing or slow convergence Danai Koutra (CMU)

  33. Optimization 3:Steps of gradient descent Details • Variable step with line search: strategy for local optimum ηP = argminf(ηP) = g1(P,Q,A,B) ηQ = argminf(ηQ) = g2(P,Q,A,B) closed formulas Danai Koutra (CMU)

  34. Optimization 3:Steps of gradient descent Details • Variable step with line search: strategy for local optimum • BiG-Align-Exact: computes the steps at every iteration ηP = argminf(ηP) = g1(P,Q,A,B) ηQ = argminf(ηQ) = g2(P,Q,A,B) closed formulas Danai Koutra (CMU)

  35. Optimization 3:Steps of gradient descent Details • But Slow change in the steps 5. 10-4 step size (η) 3. 10-4 10-4 104 2.104 3.104 iterations Danai Koutra (CMU)

  36. Optimization 3:Steps of gradient descent Details • But • BiG-Align-Skip: compute η’s every m (=500) iterations Slow change in the steps 5. 10-4 step size (η) 3. 10-4 10-4 104 2.104 3.104 iterations Danai Koutra (CMU)

  37. RoadMap • Problem Definition • What’s different? • BiG-Align • Experiments • Uni-Align • Conclusions Danai Koutra (CMU)

  38. Experimental Setup • Implementation:Matlab • Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres) • Setup: • random permutations • noiselevel: 0 - 20 % Ground truth Simulate real-world applications Danai Koutra (CMU)

  39. State-of-the-art Background • Umeyama’s algorithm [Umeyama88]: SVD-based • NMF-based approach [Ding+08]: Builds on top of Umeyama’s approach • Net-Align [Bayati+09] Belief Propagation Danai Koutra (CMU)

  40. State-of-the-art Background • Umeyama’s algorithm [Umeyama88]: SVD-based • NMF-based approach [Ding+08]: Builds on top of Umeyama’s approach • Net-Align [Bayati+09] Belief Propagation Bi-partite Uni-partite Danai Koutra (CMU)

  41. Big-Align: Accuracy vs. Runtime BiG-Align exact BiG-Align skip NMF-based Umeyama NetAlign marker size related to graph size Danai Koutra (CMU)

  42. Big-Align: Accuracy vs. Runtime BiG-Align exact BiG-Align skip NMF-based Umeyama NetAlign Big-Align improves both speed and accuracy. Danai Koutra (CMU)

  43. Big-Align: Accuracy w.r.t. noise BiG-Align-skip BiG-Align-exact NMF-based NetAlign-deg NetAlign-full Umeyama Danai Koutra (CMU)

  44. Big-Align: Accuracy w.r.t. noise BiG-Align-skip BiG-Align-exact NMF-based NetAlign-deg NetAlign-full Umeyama BiG-Align improves the accuracy for almost all levels of noise. Danai Koutra (CMU)

  45. RoadMap • Problem Definition • What’s different? • BiG-Align • Uni-Align • Conclusions Danai Koutra (CMU)

  46. Algorithm: Uni-Align Details n nodes min ||PAQ - B||F2 P fixed d features • node degree • clustering coeff • … … Danai Koutra (CMU)

  47. Algorithm: Uni-Align Details n nodes min ||PAQ - B||F2 P SVD A = USVT d features P = g*(A,B,S,U)= = closed-form solution O(n.d2) Danai Koutra (CMU)

  48. RoadMap • Problem Definition • What’s different? • BiG-Align • Uni-Align • Experiments • Conclusions Danai Koutra (CMU)

  49. Uni-Align egonet • Dataset: Facebook friendship graph (64K users) • Setup:uni-partite  bi-partite graph • Feature extraction • node degree • egonet degree • edges in egonet • mean degree of node’s neighbors Danai Koutra (CMU)

  50. Uni-Align: Accuracy vs. Runtime Uni-Align NMF-based NetAlign Umeyama Uni-Align, followed by Net-Align, is more accurate and faster than other approaches. Danai Koutra (CMU)

More Related