1 / 46

Fast Random Walk with Restart and Its Applications

Fast Random Walk with Restart and Its Applications. Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan. ICDM 2006 Dec. 18-22, HongKong. Motivating Questions. Q: How to measure the relevance?

cgriggs
Download Presentation

Fast Random Walk with Restart and Its Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec. 18-22, HongKong

  2. Motivating Questions • Q: How to measure the relevance? • A: Random walk with restart • Q: How to do it efficiently? • A: This talk tries to answer!

  3. 10 9 12 2 8 1 11 3 4 6 5 7 Random walk with restart

  4. 0.03 0.04 10 9 0.10 12 2 0.08 0.02 0.13 8 1 0.13 11 3 0.04 4 0.05 6 5 0.13 7 0.05 Random walk with restart Nearby nodes, higher scores Ranking vector More red, more relevant

  5. { } Cat Forest Grass Tiger {?, ?, ?,} Automatic Image Caption • Q … { } Sea Sun Sky Wave ? A: RWR! [Pan KDD2004]

  6. Sea Sun Sky Wave Cat Forest Tiger Grass Region Image Test Image Keyword

  7. Region Image Test Image {Grass, Forest, Cat, Tiger} Sea Sun Sky Wave Cat Forest Tiger Grass Keyword

  8. Neighborhood Formulation … … Q: what is most related conference to ICDM A: RWR! [Sun ICDM2005] … … Conference Author

  9. NF: example

  10. Center-Piece Subgraph(CePS) Q ? Original Graph Black: query nodes CePS A: RWR! [Tong KDD 2006]

  11. CePS: Example

  12. Other Applications • Content-based Image Retrieval [He] • Personalized PageRank [Jeh], [Widom], [Haveliwala] • Anomaly Detection (for node; link) [Sun] • Link Prediction [Getoor], [Jensen] • Semi-supervised Learning [Zhu], [Zhou] • …

  13. Roadmap • Background • RWR: Definitions • RWR: Algorithms • Basic Idea • FastRWR • Pre-Compute Stage • On-Line Stage • Experimental Results • Conclusion

  14. 10 9 12 2 8 1 11 3 4 6 5 7 Computing RWR Restart p Starting vector Adjacent matrix Ranking vector 1 n x 1 n x n n x 1

  15. : Maxwell Equation for Web! Beyond RWR [Chakrabarti] P-PageRank [Haveliwala] SM Learning [Zhou, Zhu] RL in CBIR [He] RWR [Pan, Sun] PageRank [Haveliwala] Fast RWR Finds the Root Solution !

  16. Q: Given query i, how to solve it? ? ?

  17. 10 9 12 2 8 1 11 3 0.04 0.03 10 9 0.10 12 4 0.13 0.08 2 0.02 8 1 11 0.13 3 6 0.04 5 4 0.05 6 5 0.13 7 7 0.05 OntheFly: No pre-computation/ light storage Slow on-line response O(mE)

  18. 0.04 0.03 10 9 0.10 12 0.13 0.08 2 0.02 8 1 11 0.13 3 0.04 4 0.05 6 5 0.13 7 0.05 PreCompute 10 9 12 2 8 1 11 R: 3 4 6 5 7 [Haveliwala]

  19. 10 9 12 2 8 1 11 3 0.04 0.03 10 9 0.10 12 4 0.13 0.08 2 0.02 8 1 11 0.13 3 6 0.04 5 4 0.05 6 5 0.13 7 7 0.05 PreCompute: Fast on-line response Heavy pre-computation/storage cost O(n ) 3 O(n ) 2

  20. Q: How to Balance? On-line Off-line

  21. Roadmap • Background • RWR: Definitions • RWR: Algorithms • Basic Idea • FastRWR • Pre-Compute Stage • On-Line Stage • Experimental Results • Conclusion

  22. 10 10 9 9 12 12 2 2 8 8 1 1 11 11 3 3 4 4 10 10 9 9 6 6 5 5 12 12 2 2 8 8 1 1 11 11 7 7 3 3 4 4 6 6 5 5 7 7 0.04 10 10 0.03 9 9 10 9 12 12 0.10 2 2 12 0.13 0.08 2 8 8 1 1 0.02 11 11 8 3 3 1 11 0.13 3 0.04 4 4 4 6 6 5 5 0.05 6 5 0.13 7 7 7 0.05 Basic Idea Find Community Combine Fix the remaining

  23. Pre-computational stage -1 • Q: • A: A few small, instead of ONE BIG, matrices inversions Efficiently compute and store Q

  24. On-Line Query Stage -1 • Q: Efficiently recover one column of Q • A: A few, instead of MANY, matrix-vector multiplication +

  25. Roadmap • Background • RWR: Definitions • RWR: Algorithms • Basic Idea • FastRWR • Pre-Compute Stage • On-Line Stage • Experimental Results • Conclusion

  26. Pre-compute Stage • p1: B_Lin Decomposition • P1.1 partition • P1.2 low-rank approximation • p2: Q matrices • P2.1 computing (for each partition) • P2.2 computing (for concept space)

  27. 10 9 12 2 8 1 11 3 4 6 5 7 P1.1: partition 10 9 12 2 8 1 11 3 4 6 5 7 Within-partition links cross-partition links

  28. 10 9 12 2 8 1 11 3 4 6 5 7 P1.1: block-diagonal 10 9 12 2 8 1 11 3 4 6 5 7

  29. 10 9 12 2 8 1 11 3 4 6 5 7 P1.2: LRA for 10 9 12 2 8 1 11 3 4 6 5 7 ~ |S| << |W2|

  30. = +

  31. p2.1 Computing

  32. Q 1,1 1,2 Q Q 1,k Comparing and • Computing Time • 100,000 nodes; 100 partitions • Computing 100,00x is Faster! • Storage Cost • 100x saving! =

  33. ~ • Q: How to fix the green portions? ~ + ~ ? +

  34. Q 1,1 1,2 Q 10 9 Q 1,k 12 2 8 1 11 3 4 6 5 7 p2.2 Computing: -1 _ U = V

  35. We have: Communities Bridges SM Lemma says:

  36. Roadmap • Background • RWR: Definitions • RWR: Algorithms • Basic Idea • FastRWR • Pre-Compute Stage • On-Line Stage • Experimental Results • Conclusion

  37. On-Line Stage • Q ? + Query Result Pre-Computation • A (SM lemma)

  38. q1: q2: q3: q4: q5: q6: On-Line Query Stage

  39. Roadmap • Background • RWR: Definitions • RWR: Algorithms • Basic Idea • FastRWR • Pre-Compute Stage • On-Line Stage • Experimental Results • Conclusion

  40. Experimental Setup • Dataset • DBLP/authorship • Author-Paper • 315k nodes • 1,800k edges • Approx. Quality: Relative Accuracy • Application: Center-Piece Subgraph

  41. Query Time vs. Pre-Compute Time Log Query Time • Quality: 90%+ • On-line: • Up to 150x speedup • Pre-computation: • Two orders saving Log Pre-compute Time

  42. Query Time vs. Pre-Storage Log Query Time • Quality: 90%+ • On-line: • Up to 150x speedup • Pre-storage: • Three orders saving Log Storage

  43. Roadmap • Background • RWR: Definitions • RWR: Algorithms • Basic Idea • FastRWR • Pre-Compute Stage • On-Line Stage • Experimental Results • Conclusion

  44. Conclusion • FastRWR • Reasonable quality preservation (90%+) • 150x speed-up: query time • Orders of magnitude saving: pre-compute & storage • More in the paper • The variant of FastRWR and theoretic justification • Implementation details • normalization, low-rank approximation, sparse • More experiments • Other datasets, other applications

  45. Q&A Thank you! htong@cs.cmu.edu www.cs.cmu.edu/~htong

More Related