Using Adaptive Methods for Updating/Downdating PageRank

Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala

Motivation • Problem: • Compute PageRank after the Web has changed slightly • Motivation: • “Freshness” Note: Since the web is growing, PageRank Computations don’t get faster as computers do.

0.4 0.4 Power Method: 0.2 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

Link Counts Gene’s Home Page Martin’s Home Page Donald Rumsfeld George W. Bush Iain Duff’s Home Page Yahoo! Linked by 2 Unimportant pages Linked by 2 Important Pages

importance of page i importance of page j number of outlinks from page j pages j that link to page i Definition of PageRank • The importance of a page is given by the importance of the pages that link to it.

1/2 1/2 1 1 0.05 0.25 0.1 0.1 0.1 Definition of PageRank Gene Martin SCCM Yahoo! Duff

PageRank Diagram 0.333 0.333 0.333 Initialize all nodes to rank

PageRank Diagram 0.167 0.333 0.333 0.167 Propagate ranks across links (multiplying by link weights)

PageRank Diagram 0.5 0.333 0.167

PageRank Diagram 0.167 0.5 0.167 0.167

PageRank Diagram 0.333 0.5 0.167

PageRank Diagram 0.4 0.4 0.2 After a while…

.1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 = 0 .2 0 .3 0 0 .1 .4 0 .1 .2 Matrix Notation

.1 .3 .2 .3 .1 .1 .1 .3 .2 .3 .1 .1 0 .2 0 .3 0 0 .1 .4 0 .1 = .2 Matrix Notation Find x that satisfies:

Eigenvalue Distribution • The matrix PT has several eigenvalues on the unit circle. This will make power method-like algorithms less effective.

Rank-1 Correction • PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET. • E is a rank 1 matrix, and in general, c=0.85. • This ensures a unique solution and fast convergence. • For matrix A, l2=c. 1 1From “The Second Eigenvalue of the Google Matrix” (http://dbpubs.stanford.edu/pub/2003-20)

0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

Power Method • Initialize: • Repeat until convergence:

Power Method Express x(0) in terms of eigenvectors of A u1 1 u2 a2 u3 a3 u4 a4 u5 a5

Power Method u1 1 u2 a22 u3 a33 u4 a44 u5 a55

Power Method u1 1 u2 a222 u3 a332 u4 a442 u5 a552

Power Method u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

Power Method u1 1 u2 0 u3 0 u4 0 u5 0

Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A. u1 1 u2 a2 u3 a3 u4 a4 u5 a5 Why does it work? • Imagine our n x n matrix A has n distinct eigenvectors ui.

All less than 1 Why does it work? • From the last slide: • To get the first iterate, multiply x(0) by A. • First eigenvalue is 1. • Therefore:

u1 1 u2 a22 u3 a33 u4 a44 u5 a55 u1 1 u2 a222 u3 a332 u4 a442 u5 a552 Power Method u1 1 u2 a2 u3 a3 u4 a4 u5 a5

0.4 0.4 Repeat: 0.2 u1 u1 u2 u2 u3 u3 u4 u4 u5 u5 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

Convergence • The smaller l2, the faster the convergence of the Power Method. u1 1 u2 a22k u3 a33k u4 a44k u5 a55k

Quadratic Extrapolation (Joint work with Kamvar and Haveliwala) Estimate components of current iteratein the directions of second two eigenvectors, and eliminate them. u1 u2 u3 u4 u5

Facts that work in our favor • For traditional problems: • A is smaller, often dense. • l2 often close to l1, making the power method slow. • In our problem, • A is huge and sparse • More importantly, l2 is small1. 1(“The Second Eigenvalue of the Google Matrix” dbpubs.stanford.edu/pub/2003-20.)

How do we do this? • Assume x(k) can be written as a linear combination of the first three eigenvectors (u1, u2, u3) of A. • Compute approximation to {u2,u3}, and subtract it from x(k) to get x(k)’

Sequence Extrapolation • A classical and important field in numerical analysis: techniques for accelerating the convergence of slowly convergent infinite series and integrals.

Example: Aitken Δ2 - Process Suppose A=An+aλn+rn where rn=bμn+o(min{1,|μ|n}), a, b, λ, μ all nonzero, |λ|>|μ|. It can be shown that Sn = (AnAn+2–An+12)/(An-2An+1+An+2) satisfies (as n goes to infinity) | Sn-A| ---------  O( (|μ|/|λ|)n = o(1). |An-A| ….

In other words… Assuming a certain pattern for the series is helpful in accelerating convergence. We can apply this component-wise in order to get a better estimate of the eigenvector.

Another approach • Assume the x(k) can be represented by three eigenvectors of A:

Linear Combination • We take some linear combination of these 3 iterates.

Rearranging Terms • We can rearrange the terms to get: Goal: Find b1,b2,b3 so that coefficients of u2 and u3are 0, and coefficient of u1is 1.

Results Quadratic Extrapolation speeds up convergence. Extrapolation was only used 5 times.

Estimating the coefficients Procedure 1: Set ß1=1 and solve the least squares problem. Procedure 2: Use the SVD for computing the coefficient of the characteristic polynomial.

Results Extrapolation dramatically speeds up convergence, for high values of c (c=.99)

Take-home message • Quadratic Extrapolation estimates the components of current iterate in the direction of the second and third eigenvector, and subtracts them off. • Achieves significant speedup, and ideas are useful for further speedup algorithms.

Summary of this part • We make an assumption about the current iterate. • Solve for dominant eigenvector as a linear combination of the next three iterates. • We use a few iterations of the Power Method to “clean it up”.

0.4 0.4 0.2 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results Power Method:

Most Pages Converge Quickly

Basic Idea • When a the PageRank of a page has converged, stop recomputing it.

Adaptive PageRank Algorithm

Updates • Use the previous vector as a start vector. • Speedup not that great. • Why? The old pages converge quickly, but the new pages still take long to converge. • But, if you use Adaptive PageRank, you save the computation on the old pages.

0.4 0.4 Repeat: 0.2 Outline • Definition of PageRank • Computation of PageRank • Convergence Properties • Outline of Our Approach • Empirical Results

Using Adaptive Methods for Updating/Downdating PageRank

Using Adaptive Methods for Updating/Downdating PageRank

Presentation Transcript

PageRank

Keyword Search in Databases using PageRank

MSERC Adaptive Simulation Methods

Adaptive Methods

lecture pagerank

Bayesian Adaptive Methods

Extrapolation Methods for Accelerating PageRank Computations

Local Approximation of PageRank and Reverse PageRank

Google’s PageRank

Pagerank

PageRank

Bayesian Adaptive Methods

Technology for Informatics PageRank

Using Adaptive Hypermedia for Web-based Education

Live Updating Operating Systems Using Virtualization

PageRank

Adaptive Methods for Speaker Separation in Cars

PageRank

Pagerank

Using Methods

Updating JUPITER framework using XML interface

Keyword Search in Databases using PageRank