240 likes | 423 Views
The PageRank Citation Ranking: Bringing Order to the Web. Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun. Contents. Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion.
E N D
The PageRank Citation Ranking:Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun
Contents • Motivation • Related work • Background Knowledge • Page Rank & Random Surfer Model • Implementation • Application • Conclusion
Motivation • Web: heterogeneous and unstructured • Free of quality control on the web • Commercial interest to manipulate ranking
Related Work • Academic citation analysis • Link based analysis • Clustering methods of link structure • Hubs & Authorities Model based on an eigenvector calculation
Hubs & Authorities Model hubs authorities
Hubs & Authorities Model • Mutually reinforcing relationship “A good hub is a page that points to many good authorities” “A good authority is a page that is pointed by many good hub”
Link Structure of the Web • Forward links (outedges) • Backlinks (inedges) • Approximation of importance / quality
PageRank • A page has high rank if the sum of the ranks of its backlinks is high • Backlinks coming from important pages convey more importance to a page • Problem: Dangling Links, Rank Sink
PageRank Calculation Given: R(u) = Rank of u, R(v) = Rank of v, c < 1 (used for normalization) Nv = number of link from v Bu = the set of pages that point to u
53 100 50 3 50 50 9 3 3 PageRank Calculation
.6 .6 .6 .6 Rank Sink • Page cycles pointed by some incoming link • Problem: Ranking increase, don’t effect any rank outside
Escape Term • Solution: Rank Source • E(u) is some vector over the web pages – uniform, favorite page etc.
Matrix Notation • R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
Computing PageRank - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks - compute normalizing factor - add escape term - control parameter While - stop when converged
Random Surfer Model • Page Rank vs. Random Surfer Model • E(u) = “the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever”
Implementation • Computing resources — 24 million pages — 75 million URLs — Process 550 pages/sec • Memory and disk storage Weight Vector (4 byte float) Matrix A (linear access)
Implementation • Assign a unique integer ID • Sort and Remove dangling links • Rank initial assignment • Iteration until convergence • Add back dangling links and Re-compute
Convergence Properties • Using theory of random walks on graphs • O(log(|V|)) due to rapidly mixing graph G of the web.
Searching with PageRank • Using title search • Comparing with Altavista
Some Applications • Estimate web traffic • Backlink predictor • User Navigation
Conclusion • PageRank is a global ranking based on the web's graph structure • PageRank uses backlinks information to bring order to the web • PageRank can separate out representative pages as cluster center • A great variety of applications