310 likes | 424 Views
Ce nter -P iece S ubgraphs: Problem definition and Fast Solutions. Hanghang Tong Christos Faloutsos Carnegie Mellon University. Ce nter- P iece S ubgraph( Ceps ). Given Q query nodes Find Center-piece ( ) Input of Ceps Q Query nodes Budget b K softand coefficient App.
E N D
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University
Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) • Input of Ceps • Q Query nodes • Budget b • K softand coefficient • App. • Social Network • Law Inforcement • Gene Network • …
Challenges in Ceps • Q1: How to measure the importance? • Q2: How to extract connection subgraph? • Q3: How to do it efficiently?
Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: Extract Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion
Ceps Overview • Individual Score Calculation • Measure importance wrt individual query • Combine Individual Scores • Measure importance wrt query set • “Extract” Alg. • … the connection subgraphs
Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion
RWR: Individual Score Calculation • Goal • Individual importance score r(i,j) = ri,j • For each node j wrt each query i • How to • Random walk with restart • Steady State Prob.
An Illustrating Example 5 Prob (RW will finally stay at j) 11 12 4 • Starting from 1 • Randomly to neighbor • Some p to return to 1 10 3 13 6 2 7 1 9 8
AND: Combine Scores • Q: How to combine scores? • A: Multiply • …= prob. 3 random particles coincide on node j
K_SoftAnd: Combine Scores Generalization – SoftAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that?
K_SoftAnd: Combine Scores Generalization – softAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that? A: Prob(at least k-out-of-Q will meet each other at j)
K_SoftAnd: Relaxation of AND Asking AND query? No Answer! Disconnected Communities Noise
AND query vs. K_SoftAnd query And Query x 1e-4 2_SoftAnd Query
Measuring Importance Individual Scores Combining Scores Random walk with restart K_SoftAnd And Steady State Prob 2_SoftAnd Meeting Prob
Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency • Experimental Results • Conclusion
2 10 9 6 8 13 11 4 5 7 12 3 1 14 15 16 “Extract” Alg. • Goal • Maximize total scores and • ‘Appropriate’ Connections • How to…”Extract” Alg. • Dynamic Programming • Greedy Alg. • Pickup promising node • Find ‘best’ path 2 10 9 6 8 13 11 4 5 7 12 3 1
Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency • Experimental Results • Conclusion
Graph Partition: Efficiency Issue • Straightforward way • Q linear system: • linear to # of edge • Observation • Skewed dist. • How to… • Graph partition
Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion
Dataset DBLP/authorship Author-Paper 315k nodes 1,800k edges Evaluation Criteria I Node Ratio I Edge Ratio Experimental Setup
Experimental Setup • We want to check • Does the goodness criteria make sense? • Does “extract” alg. capture most of important nodes/edge? • Efficiency
database Statistic 2_SoftAnd query
Evaluation of “Extract” Alg. • 20 nodes • 90%+ preserved 3 query nodes Node Ratio 2 query nodes Budget (b)
Running Time vs. Quality for Fast Ceps ~90% quality 6:1 speedup Quality Running Time
Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion
Conclusion • Q1:How to measure the importance? • A1: RWR+K_SoftAnd • Q2: How to find connection subgraph? • A2:”Extract” Alg. • Q3:How to do it efficiently? • A3:Graph Partition (Fast Ceps) • ~90% quality • 6:1 speedup
Q&A Thank you! htong@cs.cmu.edu