Ce nter -P iece S ubgraphs: Problem definition and Fast Solutions

Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University

Center-Piece Subgraph(Ceps) • Given Q query nodes • Find Center-piece ( ) • Input of Ceps • Q Query nodes • Budget b • K softand coefficient • App. • Social Network • Law Inforcement • Gene Network • …

Challenges in Ceps • Q1: How to measure the importance? • Q2: How to extract connection subgraph? • Q3: How to do it efficiently?

Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: Extract Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion

Ceps Overview • Individual Score Calculation • Measure importance wrt individual query • Combine Individual Scores • Measure importance wrt query set • “Extract” Alg. • … the connection subgraphs

Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency Issue • Experimental Results • Conclusion

RWR: Individual Score Calculation • Goal • Individual importance score r(i,j) = ri,j • For each node j wrt each query i • How to • Random walk with restart • Steady State Prob.

An Illustrating Example 5 Prob (RW will finally stay at j) 11 12 4 • Starting from 1 • Randomly to neighbor • Some p to return to 1 10 3 13 6 2 7 1 9 8

Individual Score Calculation

AND: Combine Scores • Q: How to combine scores? • A: Multiply • …= prob. 3 random particles coincide on node j

K_SoftAnd: Combine Scores Generalization – SoftAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that?

K_SoftAnd: Combine Scores Generalization – softAND: We want nodes close to k of Q (k<Q) query nodes. Q: How to do that? A: Prob(at least k-out-of-Q will meet each other at j)

K_SoftAnd: Relaxation of AND Asking AND query? No Answer! Disconnected Communities Noise

AND query vs. K_SoftAnd query And Query x 1e-4 2_SoftAnd Query

1_SoftAnd query = OR query

Measuring Importance Individual Scores Combining Scores Random walk with restart K_SoftAnd And Steady State Prob 2_SoftAnd Meeting Prob

Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency • Experimental Results • Conclusion

2 10 9 6 8 13 11 4 5 7 12 3 1 14 15 16 “Extract” Alg. • Goal • Maximize total scores and • ‘Appropriate’ Connections • How to…”Extract” Alg. • Dynamic Programming • Greedy Alg. • Pickup promising node • Find ‘best’ path 2 10 9 6 8 13 11 4 5 7 12 3 1

Roadmap • Ceps Overview • Q1: Goodness Score Calculation • Q2: “Extract” Alg. • Q3: Efficiency • Experimental Results • Conclusion

Graph Partition: Efficiency Issue • Straightforward way • Q linear system: • linear to # of edge • Observation • Skewed dist. • How to… • Graph partition

Dataset DBLP/authorship Author-Paper 315k nodes 1,800k edges Evaluation Criteria I Node Ratio I Edge Ratio Experimental Setup

Experimental Setup • We want to check • Does the goodness criteria make sense? • Does “extract” alg. capture most of important nodes/edge? • Efficiency

Case Study: AND query

database Statistic 2_SoftAnd query

Evaluation of “Extract” Alg. • 20 nodes • 90%+ preserved 3 query nodes Node Ratio 2 query nodes Budget (b)

Running Time vs. Quality for Fast Ceps ~90% quality 6:1 speedup Quality Running Time

Conclusion • Q1:How to measure the importance? • A1: RWR+K_SoftAnd • Q2: How to find connection subgraph? • A2:”Extract” Alg. • Q3:How to do it efficiently? • A3:Graph Partition (Fast Ceps) • ~90% quality • 6:1 speedup

Q&A Thank you! htong@cs.cmu.edu

Ce nter -P iece S ubgraphs: Problem definition and Fast Solutions