1 / 31

Ce nter -P iece S ubgraphs: Problem definition and Fast Solutions

Ce nter -P iece S ubgraphs: Problem definition and Fast Solutions. Hanghang Tong Christos Faloutsos Carnegie Mellon University. Ce nter- P iece S ubgraph( Ceps ). Given Q query nodes Find Center-piece ( ) Input of Ceps Q Query nodes Budget b K softand coefficient App.

santa
Download Presentation

Ce nter -P iece S ubgraphs: Problem definition and Fast Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

  2. Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) • Input of Ceps • Q Query nodes • Budget b • K softand coefficient • App. • Social Network • Law Inforcement • Gene Network • …

  3. Challenges in Ceps • Q1: How to measure the importance? • Q2: How to extract connection subgraph? • Q3: How to do it efficiently?

  4. Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: Extract Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion

  5. Ceps Overview • Individual Score Calculation • Measure importance wrt individual query • Combine Individual Scores • Measure importance wrt query set • “Extract” Alg. • … the connection subgraphs

  6. Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion

  7. RWR: Individual Score Calculation • Goal • Individual importance score r(i,j) = ri,j • For each node j wrt each query i • How to • Random walk with restart • Steady State Prob.

  8. An Illustrating Example 5 Prob (RW will finally stay at j) 11 12 4 • Starting from 1 • Randomly to neighbor • Some p to return to 1 10 3 13 6 2 7 1 9 8

  9. Individual Score Calculation

  10. Individual Score Calculation

  11. AND: Combine Scores • Q: How to combine scores? • A: Multiply • …= prob. 3 random particles coincide on node j

  12. K_SoftAnd: Combine Scores Generalization – SoftAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that?

  13. K_SoftAnd: Combine Scores Generalization – softAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that? A: Prob(at least k-out-of-Q will meet each other at j)

  14. K_SoftAnd: Relaxation of AND Asking AND query? No Answer! Disconnected Communities Noise

  15. AND query vs. K_SoftAnd query And Query x 1e-4 2_SoftAnd Query

  16. 1_SoftAnd query = OR query

  17. Measuring Importance Individual Scores Combining Scores Random walk with restart K_SoftAnd And Steady State Prob 2_SoftAnd Meeting Prob

  18. Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency • Experimental Results • Conclusion

  19. 2 10 9 6 8 13 11 4 5 7 12 3 1 14 15 16 “Extract” Alg. • Goal • Maximize total scores and • ‘Appropriate’ Connections • How to…”Extract” Alg. • Dynamic Programming • Greedy Alg. • Pickup promising node • Find ‘best’ path 2 10 9 6 8 13 11 4 5 7 12 3 1

  20. Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency • Experimental Results • Conclusion

  21. Graph Partition: Efficiency Issue • Straightforward way • Q linear system: • linear to # of edge • Observation • Skewed dist. • How to… • Graph partition

  22. Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion

  23. Dataset DBLP/authorship Author-Paper 315k nodes 1,800k edges Evaluation Criteria I Node Ratio I Edge Ratio Experimental Setup

  24. Experimental Setup • We want to check • Does the goodness criteria make sense? • Does “extract” alg. capture most of important nodes/edge? • Efficiency

  25. Case Study: AND query

  26. database Statistic 2_SoftAnd query

  27. Evaluation of “Extract” Alg. • 20 nodes • 90%+ preserved 3 query nodes Node Ratio 2 query nodes Budget (b)

  28. Running Time vs. Quality for Fast Ceps ~90% quality 6:1 speedup Quality Running Time

  29. Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion

  30. Conclusion • Q1:How to measure the importance? • A1: RWR+K_SoftAnd • Q2: How to find connection subgraph? • A2:”Extract” Alg. • Q3:How to do it efficiently? • A3:Graph Partition (Fast Ceps) • ~90% quality • 6:1 speedup

  31. Q&A Thank you! htong@cs.cmu.edu

More Related