1 / 11

Lecture # 11 PageRank ( II)

Lecture # 11 PageRank ( II). CS492 Special Topics in Computer Science: Distributed Algorithms and Systems. Remind : PageRank Algorithm. PR(A) = (1-d) + d( PR(T1)/C(T1) + ... + PR( Tn )/C( Tn ) ) = (1-d) + d( ) PR(A) : PageRank of page A

pete
Download Presentation

Lecture # 11 PageRank ( II)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture #11PageRank (II) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems

  2. Remind : PageRank Algorithm • PR(A) = (1-d) + d( PR(T1)/C(T1) + ... + PR(Tn)/C(Tn) ) = (1-d) + d( ) • PR(A) : PageRank of page A • PR(Ti) : PageRank of Pages Ti which has link to pageA • C(Ti) : number of outbound links on page Ti • d : damping factor ( between 0 and 1 )

  3. Simple Example A C B • PR(A) = (1-d) + d( ) let d = 0.85

  4. How to calculate PageRank PR(A) = 0.15 + 0.85 PR(C) PR(B) = 0.15 + 0.85 (PR(A) / 2) PR(C) = 0.15 + 0.85 (PR(A) / 2 + PR(B)) • Method 1 : Solving the equations • Do the math • Method 2 : Iterative Computation of Page Rank • Huge size of Web : hard to solve the equations • Iterative computation of PageRank values

  5. Solve the equations • Solve these equations • PR(A) = 0.15 + 0.85 PR(C) • PR(B) = 0.15 + 0.85 (PR(A) / 2) • PR(C) = 0.15 + 0.85 (PR(A) / 2 + PR(B)) • Answers • PR(A) = 1.16336913510458 • PR(B) = 0.64443188241945 • PR(C) = 1.19219898247598

  6. Iterative Computation of Page Rank • Set initial PageRankvalues to all pages • Calculate PageRanks for all pages in several iterations • Stop iteration when PageRanks converge

  7. What does PageRank mean? • Random surfer • who is given a web page at random and keep clicking on links. (never hit back button) • eventually gets bored and starts on another random page • PageRank • the probability that the random surfer visits a page • the proportion of time that the random surfer spends on each page

  8. What is the damping factor? PR(A) = (1-d) + d( ) • Damping factor • (1-d) : the probability at each page the random surfer will get bored and request another random page • The higher d, the more likely will the random surfer keep clicking links

  9. Rank Sink Problem A C B • What if we don’t have the damping factor? • No way to escape loop (A-B-C). Loop which acts as a Rank Sink

  10. Dangling Link (Dead End) • Danglink link points to any page with no outgoing links • CA and BA are dangling links • A cannot distribute its weight to the network. • How to fix • Method 1 : Remove dangling links until all the PageRanks are calculated. • Method 2 : Make random jump to any other page

  11. References [PBMW] L. Page, S. Brin, R. Motwani, T. Winograd, “The PageRank citation ranking: bringing order to the web,” WWW 1998 [BP98] Sergey Brin, Lawrence Page, “The anatomy of a large-scale hypertextual Web search engine,” Computer Networks and ISDN Systems, Vol. 30, 1998. [BGS05] Monica Bianchini, Marco Gori, Franco Scarselli, “Inside PageRank,” ACM Transactions on Internet Technology, Vol. 5, No. 1, Feb. 2005. [LM04] Amy N. Langville, Carl Meyer, “Deeper inside PageRank,” Internet Mathematics, Vol. I, No. 3, 2004. [K99] Jon Kleinberg, “Authoritative sources in a Hyperlinked Environment,” Journal of the ACM 46:5 (1999).

More Related