1 / 24

The PageRank Citation Ranking: Bringing Order to the Web

The PageRank Citation Ranking: Bringing Order to the Web. Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun. Contents. Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion.

Download Presentation

The PageRank Citation Ranking: Bringing Order to the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The PageRank Citation Ranking:Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun

  2. Contents • Motivation • Related work • Background Knowledge • Page Rank & Random Surfer Model • Implementation • Application • Conclusion

  3. Motivation • Web: heterogeneous and unstructured • Free of quality control on the web • Commercial interest to manipulate ranking

  4. Related Work • Academic citation analysis • Link based analysis • Clustering methods of link structure • Hubs & Authorities Model based on an eigenvector calculation

  5. Hubs & Authorities Model hubs authorities

  6. Hubs & Authorities Model • Mutually reinforcing relationship “A good hub is a page that points to many good authorities” “A good authority is a page that is pointed by many good hub”

  7. Link Structure of the Web • Forward links (outedges) • Backlinks (inedges) • Approximation of importance / quality

  8. PageRank • A page has high rank if the sum of the ranks of its backlinks is high • Backlinks coming from important pages convey more importance to a page • Problem: Dangling Links, Rank Sink

  9. Dangling Links

  10. PageRank Calculation Given: R(u) = Rank of u, R(v) = Rank of v, c < 1 (used for normalization) Nv = number of link from v Bu = the set of pages that point to u

  11. 53 100 50 3 50 50 9 3 3 PageRank Calculation

  12. .6 .6 .6 .6 Rank Sink • Page cycles pointed by some incoming link • Problem: Ranking increase, don’t effect any rank outside

  13. Escape Term • Solution: Rank Source • E(u) is some vector over the web pages – uniform, favorite page etc.

  14. Matrix Notation • R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized

  15. Computing PageRank - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks - compute normalizing factor - add escape term - control parameter While - stop when converged

  16. Random Surfer Model • Page Rank vs. Random Surfer Model • E(u) = “the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever”

  17. Implementation • Computing resources — 24 million pages — 75 million URLs — Process 550 pages/sec • Memory and disk storage Weight Vector (4 byte float) Matrix A (linear access)

  18. Implementation • Assign a unique integer ID • Sort and Remove dangling links • Rank initial assignment • Iteration until convergence • Add back dangling links and Re-compute

  19. Convergence Properties • Using theory of random walks on graphs • O(log(|V|)) due to rapidly mixing graph G of the web.

  20. Convergence Properties

  21. Searching with PageRank • Using title search • Comparing with Altavista

  22. Sample Results

  23. Some Applications • Estimate web traffic • Backlink predictor • User Navigation

  24. Conclusion • PageRank is a global ranking based on the web's graph structure • PageRank uses backlinks information to bring order to the web • PageRank can separate out representative pages as cluster center • A great variety of applications

More Related