110 likes | 351 Views
Lecture # 11 PageRank ( II). CS492 Special Topics in Computer Science: Distributed Algorithms and Systems. Remind : PageRank Algorithm. PR(A) = (1-d) + d( PR(T1)/C(T1) + ... + PR( Tn )/C( Tn ) ) = (1-d) + d( ) PR(A) : PageRank of page A
E N D
Lecture #11PageRank (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems
Remind : PageRank Algorithm • PR(A) = (1-d) + d( PR(T1)/C(T1) + ... + PR(Tn)/C(Tn) ) = (1-d) + d( ) • PR(A) : PageRank of page A • PR(Ti) : PageRank of Pages Ti which has link to pageA • C(Ti) : number of outbound links on page Ti • d : damping factor ( between 0 and 1 )
Simple Example A C B • PR(A) = (1-d) + d( ) let d = 0.85
How to calculate PageRank PR(A) = 0.15 + 0.85 PR(C) PR(B) = 0.15 + 0.85 (PR(A) / 2) PR(C) = 0.15 + 0.85 (PR(A) / 2 + PR(B)) • Method 1 : Solving the equations • Do the math • Method 2 : Iterative Computation of Page Rank • Huge size of Web : hard to solve the equations • Iterative computation of PageRank values
Solve the equations • Solve these equations • PR(A) = 0.15 + 0.85 PR(C) • PR(B) = 0.15 + 0.85 (PR(A) / 2) • PR(C) = 0.15 + 0.85 (PR(A) / 2 + PR(B)) • Answers • PR(A) = 1.16336913510458 • PR(B) = 0.64443188241945 • PR(C) = 1.19219898247598
Iterative Computation of Page Rank • Set initial PageRankvalues to all pages • Calculate PageRanks for all pages in several iterations • Stop iteration when PageRanks converge
What does PageRank mean? • Random surfer • who is given a web page at random and keep clicking on links. (never hit back button) • eventually gets bored and starts on another random page • PageRank • the probability that the random surfer visits a page • the proportion of time that the random surfer spends on each page
What is the damping factor? PR(A) = (1-d) + d( ) • Damping factor • (1-d) : the probability at each page the random surfer will get bored and request another random page • The higher d, the more likely will the random surfer keep clicking links
Rank Sink Problem A C B • What if we don’t have the damping factor? • No way to escape loop (A-B-C). Loop which acts as a Rank Sink
Dangling Link (Dead End) • Danglink link points to any page with no outgoing links • CA and BA are dangling links • A cannot distribute its weight to the network. • How to fix • Method 1 : Remove dangling links until all the PageRanks are calculated. • Method 2 : Make random jump to any other page
References [PBMW] L. Page, S. Brin, R. Motwani, T. Winograd, “The PageRank citation ranking: bringing order to the web,” WWW 1998 [BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, Vol. 30, 1998. [BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on Internet Technology, Vol. 5, No. 1, Feb. 2005. [LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No. 3, 2004. [K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM 46:5 (1999).