450 likes | 583 Views
Extrapolation Methods for Accelerating PageRank Computations. Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University. Giants. Search:. Results: 1. The Official Site of the San Francisco Giants. Results:
E N D
Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University
Giants Search: Results: 1. The Official Site of the San Francisco Giants Results: 1. The Official Site of the New York Giants Motivation • Problem: • Speed up PageRank • Motivation: • Personalization • “Freshness” Note: PageRank Computations don’t get faster as computers do.
0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results
Link Counts Taher’s Home Page Sep’s Home Page CS361 DB Pub Server CNN Yahoo! Linked by 2 Unimportant pages Linked by 2 Important Pages
importance of page i importance of page j number of outlinks from page j pages j that link to page i Definition of PageRank • The importance of a page is given by the importance of the pages that link to it.
1/2 1/2 1 1 0.05 0.25 0.1 0.1 0.1 Definition of PageRank Sep Taher DB Pub Server CNN Yahoo!
PageRank Diagram 0.333 0.333 0.333 Initialize all nodes to rank
PageRank Diagram 0.167 0.333 0.333 0.167 Propagate ranks across links (multiplying by link weights)
PageRank Diagram 0.5 0.333 0.167
PageRank Diagram 0.167 0.5 0.167 0.167
PageRank Diagram 0.333 0.5 0.167
PageRank Diagram 0.4 0.4 0.2 After a while…
importance of page i importance of page j number of outlinks from page j pages j that link to page i Computing PageRank • Initialize: • Repeat until convergence:
.1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 = 0 .2 0 .3 0 0 .1 .4 0 .1 .2 Matrix Notation
.1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 0 .2 0 .3 0 0 .1 .4 0 .1 = .2 Matrix Notation Find x that satisfies:
Power Method • Initialize: • Repeat until convergence:
Find x that satisfies: Find x that satisfies: A side note • PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET. • So the PageRank problem is really: not:
Power Method • And the algorithm is really . . . • Initialize: • Repeat until convergence:
0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results
Power Method Express x(0) in terms of eigenvectors of A u1 1 u2 a2 u3 a3 u4 a4 u5 a5
Power Method u1 1 u2 a22 u3 a33 u4 a44 u5 a55
Power Method u1 1 u2 a222 u3 a332 u4 a442 u5 a552
Power Method u1 1 u2 a22k u3 a33k u4 a44k u5 a55k
Power Method u1 1 u2 0 u3 0 u4 0 u5 0
Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A. u1 1 u2 a2 u3 a3 u4 a4 u5 a5 Why does it work? • Imagine our n x n matrix A has n distinct eigenvectors ui.
All less than 1 Why does it work? • From the last slide: • To get the first iterate, multiply x(0) by A. • First eigenvalue is 1. • Therefore:
u1 1 u2 a22 u3 a33 u4 a44 u5 a55 u1 1 u2 a222 u3 a332 u4 a442 u5 a552 Power Method u1 1 u2 a2 u3 a3 u4 a4 u5 a5
Convergence • The smaller l2, the faster the convergence of the Power Method. u1 1 u2 a22k u3 a33k u4 a44k u5 a55k
Our Approach Estimate components of current iteratein the directions of second two eigenvectors, and eliminate them. u1 u2 u3 u4 u5
Why this approach? • For traditional problems: • A is smaller, often dense. • l2 often close to l1, making the power method slow. • In our problem, • A is huge and sparse • More importantly, l2 is small1. • Therefore, Power method is actually much faster than other methods. 1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)
x(0) u1 u1 u2 u3 u4 u5 Using Successive Iterates
u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) u1
u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x(2) u1
u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x(2) u1
u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x’ = u1
How do we do this? • Assume x(k) can be written as a linear combination of the first three eigenvectors (u1, u2, u3) of A. • Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’
Assume • Assume the x(k) can be represented by first 3 eigenvectors of A
Linear Combination • Let’s take some linear combination of these 3 iterates.
Rearranging Terms • We can rearrange the terms to get: Goal: Find b1,b2,b3 so that coefficients of u2 and u3are 0, and coefficient of u1is 1.
Summary • We make an assumption about the current iterate. • Solve for dominant eigenvector as a linear combination of the next three iterates. • We use a few iterations of the Power Method to “clean it up”.
0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results
Results Quadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times!
Results Extrapolation dramatically speeds up convergence, for high values of c (c=.99)
Take-home message • Speeds up PageRank by a fair amount, but not by enough for true Personalized PageRank. • Ideas are useful for further speedup algorithms. • Quadratic Extrapolation can be used for a whole class of problems.
The End • Paper available at http://dbpubs.stanford.edu/pub/2003-16