Web Information retrieval (Web IR)

Web Information retrieval (Web IR) Handout #12: Combinational Ranking Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir

Ranking Algorithm Problems • Rich-get- richer (Connectivity based) • Low precision (at most 0.30) • Each ranking algorithm operates well in some situations

Combinational Ranking • Content + connectivity +??? • How can we combine these features? • R=f( query, content, connectivity)

Relevance propagation Model (by Shakery) • A hyper score (h) is computed for each document. • WI and WO are weighting functions for in-link and out-link pages, respectively. • S (p) is similarity between query q and page p(self relevance):

Three Iterative Models • Weighted In-Link • Weighted Out-Link • Uniform Out-Link

Weighted In-Link • This model of user behavior is quite similar to Random surfer, except that it is not query-independent. The probability that the random surfer visits a page is its hyper-relevance score.

Weighted Out-Link • In this model, we assume that given a page to a user, he reads the content of the page with probability alpha and he traverses the outgoing edges with probability (1-alpha). The pages that are linked from a page do not have the same impact on its weight.

Uniform Out-Link • In this special case, they assume that at each page, the user reads the content of the page, and with probability (1-alpha) he reads all the pages that are linked from the page.

Algorithm Implementation • Algorithm is run on a working set • Working set construction: • They first find the top 100000 pages which have the highest content similarity to the query • From these 100000 pages, a small number (about 200) of the most similar pages are selected to be the core set of pages. • They then expand the core set to the working set by adding the pages that are among the 100000 pages and which point to the pages in the core set or are pointed to by the pages in the core set

Algorithm Properties • It is • Online?? • Recursive • Query independent • It is shown on TREC Weighted In-Link outperforms others

Frequency Propagation (By Song) • Instead of Propagation of score, frequency of query terms are propagated • We can use it online • It is used based on site structure

Propagation Formula • ft(p) is the frequency of tem t in page p • f’t(p) is the frequency of tem t in page p after propagation

Overall Framework for propagation • SS is the best • ST & HT-WI are similar

Combinational Ranking Algorithms Based on learning (Learning to Rank)

Training Set q1:{(x11,4),(x12,3),…(x1m,0)} q2:{(x21,3),(x22,2),…(x2m,1)} …. qn:{(xn1,4),(xn2,3),…(xnm,2)} Learning System Ranking Model g(x,w) Labels (Relevance judgments or click orders) (x1,g(x1,w)) (x2,g(x2,w)) (x3,g(x2,w)) … Ranking System (x1,?), (x2,?),… Test Set Combination Framework

Three learning categories • Point wise • Pair wise • List wise

Web Information retrieval (Web IR)