230 likes | 362 Views
Page Ranking Techniques In Search Engines. Introduction . Need Increasing need of Search engine. Search results should be ordered by Relevancy. Importance. What is Page Ranking. Algorithms. HITS (Hyperlink Induced Topic Search) e.g.Alta Vista PageRank
E N D
Introduction • Need Increasing need of Search engine. Search results should be ordered by Relevancy. Importance. • What is Page Ranking
Algorithms • HITS (Hyperlink Induced Topic Search) e.g.Alta Vista • PageRank e.g. Google.
Definition – PageRank. We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor, which can be set between 0 and 1. We usually set d to 0.85 .……. C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Ref: Sergey Brin and Lawrence Page ”The Anatomy of a Large-Scale Hypertextual Web Search Engine” http://www-db.stanford.edu/~backrub/google.html
How to use formula. e.g. 2 pages A and B, pointing to each other. A B
Start with PR(A) = PR(B) =1 PR(A) = (1-d) + d * (PR(B)/C(B)) = (1-0.85) + 0.85 * (1/1) = 1 PR(B) = (1-d) + d * (PR(A)/C(A)) = (1-0.85) + 0.85 * (1/1) = 1
Lets start with PR(A) = PR(B) = 10 After 1st iteration: PR(A) = (1-d) + d*(PR(B)/C(B)) = 0.15 + 0.85 * (10/1) = 8.65 PR(B) = (1-d) + d*(PR(A)/C(A)) = 0.15 + 0.85 * (8.65/1) = 7.50
After 2nd iteration: PR(A) = (1-d) + d*(PR(B)/C(B)) = 0.15 + 0.85 * (7.50/1) = 6.527 PR(B) = (1-d) + d*(PR(A)/C(A)) = 0.15 + 0.85 * (6.527/1) = 5.698 And so on….. till?
Ans: Iterations should be repeated till PR values converges…….. In this example ……..till PR(A) = PR(B) =1. Thus we can start with any values of PR, and should repeat iterations till PR values converges i.e. don’t change too much.
Difference… Result of PR calculation. Google toolbar values
ExamplesAssumption: We’ll take initial PR value of each page as 1.0
Example 1 B PR(A) = (1-d) + d ( 0) = 0.15 PR(B) = (1-d) + d (0) = 0.15 A For practicing examples on PageRank use calculator: www.webworkshop.net/pagerank_calculator.php?lnks=2,10,15&iblprs=0.15,0.15,0.15,0.15&pgnms=&pgs=2&initpr=1&its=100&type=simple
Example 2 PR (A) = (1-d) + d (PR(B) / C(B)) = 0.15 + 0.85 (1/1) = 1 PR (B) = (1-d) + d (0) = 0.15 Dangling links are links that go to pages that don't have any outbound links. Orphan pages are those, which don’t have any inbound link. A B
Example 3 From here onwards I’ll represent final PR values after sufficient no. of iterations inside page. B 1.0 B 1.0 A 1.0 A 1.0 C 1.0 C 1.0
Example 4 Observation: We can channel large proportion of PR of site to a particular page. B 0.575 A 1.85 C 0.575
Example 5 Observation: We can reduce PR leak by increasing internal link structure. B 0.575 External Site1 1.0 A 1.0 C 0.575 External Site 2 0.638 B 1.255 External Site 1 1.0 A 2.6 C 1.255 External Site 2 1.215
Example 5 Cont.. B 1.549 A 2.146 External Site 1 1.0 C 1.720 External Site 2 1.215
How to increase PR? • By adding spam pages. • Join forum. • Submit to search engine directories. • Reciprocating links. • Contents.
Adding spam pages. B 281.6 Spam 1 0.39 A 331.0 Spam 2 0.39 Spam 1000 0.39
Conclusion. Even though formula for calculating PageRank seems to be difficult, it is easy to understand. But when a simple calculation is applied hundreds of times, the results can seem complicated. And we can not predict the result of these iterations. Surely, more practice can yield more observations. PageRank is important factor considered in Google ranking, but it is only one of the important factors considered. e.g. now a days Google is paying a lot of attention to the link’s anchor text while deciding relevancy of target page. But as Page Rank is also one of the important factor, one should be well aware of PageRank while designing the website.
References. • http://www.webworkshop.net/pagerank.html • http://www.iprcom.com/papers/pagerank/ • http://www-db.stanford.edu/~backrub/google.html • http://www.google.com/intl/en/technology/ • http://www.google-watch.org/pagerank.html