170 likes | 397 Views
Study of Page Rank and HITS. By: Ankit Sethi FSU. How do Search Engines work. Motive . Goals of Study: Getting aware about Page Rank and HITS- two popular algorithms used by search engines to rank webpages.
E N D
Study of Page Rank and HITS By: AnkitSethi FSU
Motive • Goals of Study: • Getting aware about Page Rank and HITS- two popular algorithms used by search engines to rank webpages. • To provide a future direction in the area of efficient algorithms for ranking of web pages.
Do we really need Page Ranking? • The name “Page Rank” comes from Larry Page. • In earlier days, search engines used to link to pages having the highest keyword density—But Is that good? • The number and importance of links pointing to a given webpage determines its Page ranking
Importance of Page Rank • Page rank algorithm computes a web page’s importance • Larry Page concluded that the pages with the highest number of links to them are most important.
Link Structure of the Web • Backlinks and Forward links: • A and B are C’s backlinks • C is A and B’s forward link • Generally, a webpage is important if it has a lot of backlinks, but it’s not always true!
Simplified Algorithm of Page Rank Example 1 Suppose there are 4 webpages: A, B, C and D Assume B, C, D points to A. Each link would transfer 0.25 PR to A. PR(A)= PR(B)+PR(C)+PR(D) Page Rank = 0.75 Remember 0.25 is just a random assumption!
Page Rank Algo Contd.. Example 2 • Now, lets say Page B has a link to page C and A, page C has a link to page A, and Page D has links to all 3 pages • PR(A)= PR(B)/2 + PR(C)/1+PR(D)/3 • We can say that Page B contributes to .25/2=.125 to page A and Page C, Page C still contributes 0.25 , and page D contributes 0.25/3=0.083 to A So Page Rank of A=.125+0.25+0.083=0.458
Page Rank: Damping Factor • Damping Factor: Probability that a user stops clicking links and request another random page. It is originally set to be 0.85 or 85%. PR(A)= 1-d/N+d(PR(B)/L(B)+PR(C)/L(C)+PR(D)/L(D)+……) Where N: number of documents in the collection
Page rank: Matrix form R is solution of the equation • Adjacency function l(pi, pj)=0 if page pj doesn’t link to pi, else 1.
Page Rank Summary • Query Independent • A global ranking of all web pages based on their locations in the web graph structure • Uses information which is external to the web pages – backlinks • Backlinks from important pages are more significant than backlinks from average pages
Interesting Points about Page Rank • Aaron Wall quoted “Page Rank is certainly important to driving indexing, but it nowhere near as important as it once was in terms of delivering top rankings…” • Do sites have Page Rank? • No, they don’t. It applies to individual pages • It is just one out of several algorithms used by google for ranking webpages!
HITS Algorithm • Stands for Hyperlink-Induced Topic Search. • Used by Twitter 1. Authorities are pages containing useful information • course home pages • home pages of auto manufacturers 2. Hubs are pages that link to authorities • course bulletin • list of US auto manufacturers
HITS Contd.. • A good hub links to many good authorities • A good authority is linked from many good hubs • Authority Update: Update each node authority’s score to be equal to sum of Hub scores of each node that points to it. • Hub Update: Update each node’s Hub score to be equal to sum of Authority scores that it points to.
Hub Score and Authority Score • Start with each node having a hub score and authority score of 1. • Run the Authority update rule • Run the Hub update Rule • Normalize those values by dividing Authority and Hub score by square root of sum of squares of all Auth. scores and Hub scores resp. • Just keep repeating
Interesting Points about HITS! • Query Dependent • Executed at query time, and not at indexing time • Not widely used • Computes two scores per document