Extrapolation Methods for Accelerating PageRank Computations

Extrapolation Methods for Accelerating PageRank Computations Sepandar D. Kamvar Taher H. Haveliwala Christopher D. Manning Gene H. Golub Stanford University

Giants Search: Results: 1. The Official Site of the San Francisco Giants Results: 1. The Official Site of the New York Giants Motivation • Problem: • Speed up PageRank • Motivation: • Personalization • “Freshness” Note: PageRank Computations don’t get faster as computers do.

0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

Link Counts Taher’s Home Page Sep’s Home Page CS361 DB Pub Server CNN Yahoo! Linked by 2 Unimportant pages Linked by 2 Important Pages

importance of page i importance of page j number of outlinks from page j pages j that link to page i Definition of PageRank • The importance of a page is given by the importance of the pages that link to it.

1/2 1/2 1 1 0.05 0.25 0.1 0.1 0.1 Definition of PageRank Sep Taher DB Pub Server CNN Yahoo!

PageRank Diagram 0.333 0.333 0.333 Initialize all nodes to rank

PageRank Diagram 0.167 0.333 0.333 0.167 Propagate ranks across links (multiplying by link weights)

PageRank Diagram 0.5 0.333 0.167

PageRank Diagram 0.167 0.5 0.167 0.167

PageRank Diagram 0.333 0.5 0.167

PageRank Diagram 0.4 0.4 0.2 After a while…

importance of page i importance of page j number of outlinks from page j pages j that link to page i Computing PageRank • Initialize: • Repeat until convergence:

.1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 = 0 .2 0 .3 0 0 .1 .4 0 .1 .2 Matrix Notation

.1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 0 .2 0 .3 0 0 .1 .4 0 .1 = .2 Matrix Notation Find x that satisfies:

Power Method • Initialize: • Repeat until convergence:

Find x that satisfies: Find x that satisfies: A side note • PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET. • So the PageRank problem is really: not:

Power Method • And the algorithm is really . . . • Initialize: • Repeat until convergence:

Power Method Express x(0) in terms of eigenvectors of A u1 1 u2 a2 u3 a3 u4 a4 u5 a5

Power Method u1 1 u2 a22 u3 a33 u4 a44 u5 a55

Power Method u1 1 u2 a222 u3 a332 u4 a442 u5 a552

Power Method u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

Power Method u1 1 u2 0 u3 0 u4 0 u5 0

Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A. u1 1 u2 a2 u3 a3 u4 a4 u5 a5 Why does it work? • Imagine our n x n matrix A has n distinct eigenvectors ui.

All less than 1 Why does it work? • From the last slide: • To get the first iterate, multiply x(0) by A. • First eigenvalue is 1. • Therefore:

u1 1 u2 a22 u3 a33 u4 a44 u5 a55 u1 1 u2 a222 u3 a332 u4 a442 u5 a552 Power Method u1 1 u2 a2 u3 a3 u4 a4 u5 a5

Convergence • The smaller l2, the faster the convergence of the Power Method. u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

Our Approach Estimate components of current iteratein the directions of second two eigenvectors, and eliminate them. u1 u2 u3 u4 u5

Why this approach? • For traditional problems: • A is smaller, often dense. • l2 often close to l1, making the power method slow. • In our problem, • A is huge and sparse • More importantly, l2 is small1. • Therefore, Power method is actually much faster than other methods. 1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

x(0) u1 u1 u2 u3 u4 u5 Using Successive Iterates

u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) u1

u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x(2) u1

u1 u2 u3 u4 u5 Using Successive Iterates x(0) x(1) x’ = u1

How do we do this? • Assume x(k) can be written as a linear combination of the first three eigenvectors (u1, u2, u3) of A. • Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’

Assume • Assume the x(k) can be represented by first 3 eigenvectors of A

Linear Combination • Let’s take some linear combination of these 3 iterates.

Rearranging Terms • We can rearrange the terms to get: Goal: Find b1,b2,b3 so that coefficients of u2 and u3are 0, and coefficient of u1is 1.

Summary • We make an assumption about the current iterate. • Solve for dominant eigenvector as a linear combination of the next three iterates. • We use a few iterations of the Power Method to “clean it up”.

Results Quadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times!

Results Extrapolation dramatically speeds up convergence, for high values of c (c=.99)

Take-home message • Speeds up PageRank by a fair amount, but not by enough for true Personalized PageRank. • Ideas are useful for further speedup algorithms. • Quadratic Extrapolation can be used for a whole class of problems.

The End • Paper available at http://dbpubs.stanford.edu/pub/2003-16

Extrapolation Methods for Accelerating PageRank Computations

Extrapolation Methods for Accelerating PageRank Computations

Presentation Transcript

PageRank

lecture pagerank

Pagerank

Experiments with Range Computations using Extrapolation

PageRank

Technology for Informatics PageRank

PageRank

PageRank

Pagerank

28. PageRank

8. Numerical methods for reliability computations

Parallel Computations in Quantum Lanczos Representation Methods

Extrapolation Methods for Accelerating PageRank Computations

PageRank

Customisable FPGA Platform for Accelerating Floating Point Computations

Accelerating the Pagerank Algorithm

Using Adaptive Methods for Updating/Downdating PageRank

PageRank

PageRank

PageRank

8. Numerical methods for reliability computations

Experiments with Range Computations using Extrapolation