340 likes | 558 Views
Distributed PageRank Computation Based on Iterative Aggregation-Disaggregation Methods. Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen. Outline. Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm
E N D
Distributed PageRank ComputationBased on Iterative Aggregation-Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen
Outline • Quick Review of PageRank • Distributed PageRank Computation • Motivation • Basic Idea • Algorithm • Experiments • Conclusion and Future Work
PageRank - Background Ranking Web pages • Content-based methods • Link-based methods • PageRank [Page & Brin, 1998] • HITS [Kleinberg, 1998] • SALSA [Lempel & Moran, 2000]
PageRank - Intuition • Page A points to B means that the author of A recommends B. • A page is of high quality if it is • referred to by many other pages • referred to by pages of high quality
PageRank - Model • Random Surfer - Markov Chain
PageRank - Algorithm • Power method
Outline • Quick Review of PageRank • Distributed PageRank Computation • Motivation • Basic Idea • Algorithm • Experiments • Conclusion and Future Work
Motivation • Compass search engine confederation
Basic Idea • Divide and conquer • Make use of the natural block structure of web graphs
DPC Algorithm • Step 1 - Initialization Local nodes compute local PageRank vectors.
DPC Algorithm (cont.) • Step 2 - Aggregation Central node computes the NodeRank vector.
DPC Algorithm (cont.) • Step 3 - Disaggregation Local nodes compute extended local PageRank vectors. X: External nodes
DPC Algorithm (cont.) • Step 4 - Central node computes the L1 distance between current global PageRank vector and previous one.
Advantages • DPC mainly consists of standard PageRank computation. • Small matrices fit into main memory. • Low communication overhead.
Outline • Quick Review of PageRank • Distributed PageRank Computation • Motivation • Basic Idea • Algorithm • Experiments • Conclusion and Future Work
Experimental Setup • Simulation on a single Linux box. • Group web pages by sites. • For comparison • Classic power method • LPR-Ref-2 algorithm in [Wang, VLDB 2004]
Data Sets • ST01/03 - crawled in 2001/2003 by Stanford WebBase Project • CN04 - crawled in 2004 from web sites in China.
Evaluation Metrics • L1 distance • Kendall's τ-distance if page i and j are in different order in the two ranking lists.
Accuracy of the First Iteration • L1 • Kendall
Convergence Rate Number of iteration for convergence ( )
Outline • Quick Review of PageRank • Distributed PageRank Computation • Experiments • Conclusion and Future Work
Conclusion • A distributed PageRank computation algorithm based on iterative aggregation-disaggregation (IAD) methods with Block Jacobi smoothing. • Experiments on real web graphs show that DPC outperforms LPR-Ref-2[Wang, VLDB'04], and converges 5~7 times faster than Power method.
Future Work • Implement DPC in distributed system. Integrate with Compass search engine confederation. • How to update PageRank vectors efficiently within DPC framework?
IAD Method - Notations • Aggregation matrix(n×N) • Disaggregation matrix(N×n)
DPC -Convergence Analysis • The global convergence of IAD method is still an open problem. • The difficulty partly comes from that the disaggregation step is non-linear. • The paper proves the global convergence of Block Jacobi method in PageRank scenario when n > 2.
Experiments - Basic Facts • Distribution over number of pages hosted by sites of different size • Distribution over size of sites
Experiments - Communication Overhead Pos(•) - Number of positive elements L/U - Block strictly lower/upper triangular part of P Power LPR-Ref-2 / DPC