120 likes | 329 Views
Experiments with MATLAB Google PageRank. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan jang@mirlab.org http://mirlab.org/jang. PageRank Algorithm. Facts about PageRank Algorithm
E N D
Experiments with MATLABGoogle PageRank Roger Jang (張智星) CSIE Dept, National Taiwan University, Taiwan jang@mirlab.org http://mirlab.org/jang
PageRank Algorithm • Facts about PageRank Algorithm • Developed by Google’s founders, Larry Page and Sergey Brin, when they were graduate students at Stanford University • Determined entirely by the link structure of the WWW • Recomputed about once a month • The world’s largest matrix computation • Ideas • A random walk problem known as Markov chain/process • Page rank: Limiting probability that a random surfer visits a page • A page has high rank if other pages with high rank link to it.
Connectivity Matrix G • Notations • U: the set of all n web pages in the world (n > 4 billion by June 2004) • G: the connectivity matrix • gij = 1 if there is a hyperlink to page i from page j and • gij = 0 otherwise. • Facts • G is huge, but very sparse • No. of nonzeros in G is the total no. of hyperlinks in U. 1 4 2 3 6 5
Degrees of a Page 1 4 • Degrees of a page • Define row and column sums of G: • cj: out-degree of page j • ri: in-degree of page i 2 3 6 5
Transition Matrix A • The jth column of A is the prob. of jumping from the jth page to the other pages • Two-types of transitions • Type 1: Follow one of the link (with prob. p) • Type 2: Jump to a random page (with prob. 1-p)
Transition Matrix A • Facts • A is the transition prob. matrix of the Markov chain. • Its elements are all strictly between 0 and 1 and its column sums are all equal to 1. • A comes from scaling G by its column sums. • Most of the elements of A are equal to (1-p)/n. • If n=4*10^9 and p=0.85, then (1-p)/n=3.75*10^-11. • Perron-Frobenius theorem: A nonzero solution of x=Ax exists and is unique to within a scaling factor. • If the scaling factor is chosen so that the sum of x is 1, then x is Google’s PageRank.
How to Compute PageRank • Eigenvector method • x=A*x x is the eigenvector corresponding to eigenvalue 1 • Fact • A always has an eigenvalue of 1 • Power method • Repeat x=A*x until x converges • The only possible approach for a large n • Fact • 1 is the eigenvalue of the maximum length of A’s eigenvalues • Anx is not affected by x as n increases
Fact 1 • A always has an eigenvalue of 1 • Since the column sum of A is an all-1 vector, AT has 1 as its eigenvalue: • So 1 is also an eigenvalue of A since
Fact 2 • A has 1 as its eigenvalue of max magnitude • Anx approaches the page rank as long as n is big enough and x sums to 1.
Example • A tiny web • Transition matrix A • When p=0.85, we have the page rank (via pagerank.m): 1 4 2 3 6 5