490 likes | 1.57k Views
HITS Hypertext-Induced Topic Selection. BÜŞRA İPEK SELİME IŞIK. OUTLINE. Introduction PageRank Algorithm HITS Algorithm HITS Example HITS vs PageRank Conclusion. Search Engines. 1.Crawler: retrieves the contents of web pages
E N D
HITSHypertext-Induced Topic Selection BÜŞRA İPEK SELİME IŞIK Selime Işık-Büşra İpek
OUTLINE • Introduction • PageRank Algorithm • HITS Algorithm • HITS Example • HITS vs PageRank • Conclusion Selime Işık-Büşra İpek
Search Engines 1.Crawler: retrieves the contents of web pages 2.Indexer: stores and indexes information on the retrieved pages 3.Ranker: determines the importance of web pages returned 4.Retrieval Engine: performs lookups on index tables Selime Işık-Büşra İpek
Ranking • Today’s search engines may return millions of pages for a certain query • It is not possible for a user to preview all the returned results • So, ranking is helpful Selime Işık-Büşra İpek
Rankers Rankers are classified into two groups : 1.Content-based rankers • number of matched terms • frequency of terms • location of terms 2.Connectivity-based rankers • links that point to them Selime Işık-Büşra İpek
Link Analysis There are two famous link analysis methods: 1.PageRank Algorithm 2.HITS Algorithm Selime Işık-Büşra İpek
PageRank • originally formulated by Sergey Brin and Larry Page • does not rank web sites as a whole but is determined for each page individually according to their authoritativeness • if an authoritative web page A links to page B, then B is also authoritative Selime Işık-Büşra İpek
PageRank (2) • recursive formula • page rank initially 1 for all nodes • normalized when difference between two successive calculations is very small PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Selime Işık-Büşra İpek
HITS • Kleinberg's hypertext-induced topic selection (HITS) algorithm is also developed for ranking documents based on the link information among a set of documents. Selime Işık-Büşra İpek
Authorities and hubs • The algorithm produces two types of pages: - Authority: pages that provide an important, trustworthy information on a given topic - Hub: pages that contain links to authorities • Authorities and hubs exhibit a mutually reinforcing relationship: a better hub points to many good authorities, and a better authority is pointed to by many good hubs Selime Işık-Büşra İpek
5 2 5 1 6 1 1 3 6 7 4 7 Authorities and hubs (2) a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7) Selime Işık-Büşra İpek
Definitions • Authority: pages that provide an important, trustworthy information on a given topic • Hubs:pages that contain links to authorities • Indegree:number of incoming links to a given node, used to measure the authoritativeness • Outdegree:number of outgoing links from a given node, here it is used to measure the hubness Selime Işık-Büşra İpek
HITS Algorithm • Hubs point to lots of authorities. • Authorities are pointed to by lots of hubs. • Together they form a bipartite graph: • Hubs Authorities Selime Işık-Büşra İpek
Step By Step HITS-1 • determines a base set S • let set of documents returned by a standard search engine be called the root set R • Initialize S to R Selime Işık-Büşra İpek
Step By Step HITS - 2 • Add to S all pages pointed to by any page in R. • Add to S all pages that point to any page in R • Maintain for each page p in S: Authority score: ap(vector a) Hub score: hp (vector h) Selime Işık-Büşra İpek
Step By Step HITS - 3 • For each node initiliaze the ap and hp to 1/n • In each iteration calculate the authority weight for each node in S Selime Işık-Büşra İpek
Step By Step HITS - 4 • In each iteration calculate the hub weight for each node in S • Note:The hub weights are computed from the current authority weights, which were computed from the previous hub weights. Selime Işık-Büşra İpek
Step By Step HITS - 5 • After new weights are computed for all nodes, the weights are normalized: Selime Işık-Büşra İpek
Convergence of HITS Algorithm • Let A be an adjacency matrix of S • Aij = 1 for i S , jS if and only if i->j • Authority and hub: ak = φkAThk-1; hk = ψkAak; • Combination of both formulas gives: ak = φkψk-1ATAak-1 for k > 1 hk = ψkφkAAThk-1 for k > 0 Selime Işık-Büşra İpek
Convergence of HITS Algorithm-2 • The algorithm converges to a fixed point if iterated indefinitely and the resulting authority and hub vectors satisfy a* = (1/µ*)ATAa*; h* = (1/µ*)AATh*; • The authority vector a* is an eigenvector of ATA ,converging to ATA • The hub vector h* is an eigenvector of AAT, converging to AAT Selime Işık-Büşra İpek
The Pseudocode of HITS Selime Işık-Büşra İpek
HITS Example • Root Set R {1,2,3,4} • Extend it to form the base set S Selime Işık-Büşra İpek
Authority and Hubness Weight HITS Example Results Authority Hubness 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Selime Işık-Büşra İpek
HITS vs PageRank • HITS emphasizes mutual reinforcement between authority and hub webpages, while PageRank does not attempt to capture the distinction between hubs and authorities. It ranks pages just by authority. • HITS is applied to the local neighborhood of pages surrounding the results of a query whereas PageRank is applied to the entire web • HITS is query dependent but PageRank is query-independent Selime Işık-Büşra İpek
HITS vs PageRank (2) • Both HITS and PageRank correspond to matrix computations. • Both can be unstable: changing a few links can lead to quite different rankings. • PageRank doesn't handle pages with no outedges very well, because they decrease the PageRank overall Selime Işık-Büşra İpek
Conclusion • HITS is a general algorithm used for calculating the authority and hubs in order to rank the retrieved data • The basic aim of that algorithm is to induce the Web graph by finding set of pages with a search on a given topic (query). • Results demonstrates that it is good in calculating the authority nodes and hubness. Selime Işık-Büşra İpek
References • http://www.cs.cornell.edu/home/kleinber/auth.pdf • http://www.dfki.de/~klusch/I2A-UDS-SS05/lecture-3.pdf • http://www.cs.utexas.edu/~mooney/ir-course/slides/LinkAnalysis.ppt#261,2,Meta-Search Engines • research.microsoft.com/users/tyliu/files/USTC-Lecture-tyliu.ppt • http://www.cs.cornell.edu/home/kleinber/ • http://www2002.org/CDROM/refereed/643/node2.html Selime Işık-Büşra İpek
THANK YOU Selime Işık-Büşra İpek
ANY QUESTIONS? Selime Işık-Büşra İpek