220 likes | 362 Views
Multiple Intents Re-ranking. By: Yossi Azar , Iftah Gamzu , Xiaoxin Yin pp. 669-678, in Proc eedings of STOC 2009 Presented By: Bhawana Goel. Web search and Ranking. Ranking of search results on the basis of: Hyperlink structure of the web Content of the web page
E N D
Multiple Intents Re-ranking By: Yossi Azar, IftahGamzu, XiaoxinYin pp. 669-678, in Proceedings of STOC 2009 Presented By: BhawanaGoel
Web search and Ranking • Ranking of search results on the basis of: • Hyperlink structure of the web • Content of the web page • User’s location • Not much research on user’s “intent”
Intent • Same query different intents • “computer science at A&M” • Information about computer science department at A&M • Information about admission to computer science department at A&M
Problem Statement • 20% of web queries are ambiguous • Different user types with different intents • Goal is to minimize the average effort of browsing through the search results • Re-rank the web results
Optimal ordering? 1 2 3 3 2 1 Minimize average effort for all User types 2 3 3 1 1 2
Types of Intents • Navigational • First result is relevant • Informational • All the results are relevant • Complex • First and third results are relevant
Overview • Each user type has its own profile vector with subset of relevant pages • <1,0…0> , <0,0…1> , <1,1…1> • The elements in vector correspond to positions and not particular page • Order of result pages in vector is irrelevant and is determined by search engine • Depicts intention • Type of query need • Depicts proportion of users • <1,0,0> <100,0,0> One user 100 users
Calculation of user effort Navigational (<1,0,0>) 2 * 1 = 2 Informational (<1,1,1>) 2*1 + 4*1 + 5*1 = 11 Complex (<0.4,0.4,0.2>) 2*0.4 + 4*0.4 + 5*0.2 = 3.4 Profile Vectors 3 1 2 9 1 1 2 4 3 3 4 2 5
Problem formulation • Form a weighted hypergraph • With vertices = web results • Hyperedges = user types • Weights = user profiles Overhead e1 1 9 e2 (1,2,3)*<1,0,0> = 1 2 3 1 1 2 9 4 e1 (2,4,5)*<15,20,25> = 235 3 3 4 4 e2 2 5
Special Cases • All user profiles are of type <1,0,…0> • It’s a case of min-sum set cover problem • Its NP-hard • Has an approximation ratio of 4 F G I C A B A B C Greedily pick the element which covers the most number of uncovered sets. A F C B G I
Special Cases • All user profiles are of type <0,0,…1> • It’s a case of minimum-latency set cover problem • Its NP-hard • Has e-approximation algorithm
Case 1: Non-increasing weight vectors • Non-increasing weight vectors • Generalization for min-sum set cover problem • Greedy weight reduction algorithm • Approximation ratio of 4 (2,2,0) A B C A D E A F (3,0) F G (4,1,0)
Greedy algorithm in General Case • Greedy weight reduction algorithm does not work in the general case • Approximation ratio is unbounded k x <1,0> <0,w> OPT = k2 2w + (3+4…k+2) ALG = k3 (1+2…k) + (k+2)w w = k2
Case 2: Arbitrary weight vectorsHarmonic interpolation algorithm • Greedy algorithm takes only local maxima into account • Apply greedy algorithm on harmonically interpolated weight vectors • It provides knowledge about future weight reduction potentials of hyperedges ALG = 2w/2 + (3+4…k+2) k x <1,0> <w/2,w>
Harmonic Interpolation Algorithm Phase I: Calculate harmonic interpolation for weight vectors for all e eE Algorithm Phase II: 2. Calculate the weight of each vertex according to changed weight vectors 3. Select vertex with maximum weight (GREEDY WEIGHT REDUCTION ALGORITHM)
Analysis of harmonic interpolation algorithm • Use indicator vectors :<0,0,…w…0,0> • Only one entry is non-zero • Harmonic interpolation : <w/j,…w/2,w,…0> • Notations • (e,i): a potential pair • w(e,i): weight of the potential pair • let t be the time when (e,i) is covered • Penalty of a step = remaining harmonic weight/weight covered • have to minimize: ∑t=1 ∑(e,i) w(e,i) × t
Optimal solution histogram Create a histogram with no of columns = number of potential pairs, width of a column = w(e,i) and height of the column = t(e,i) Time Its monotonically increasing potential pairs
Histogram for algorithmic solution Histogram with no of columns = number of potential pairs, width of a column = ŵ(e,i) and height of the column = penalty of the step Its not monotonic
ALG/4 Approximation Ratio • Reduce width of ALG by 2Hr and height by 2 • The new histogram completely fits inside optimal solution histogram • ALG/4Hr >= OPT
Conclusion • O(log r) solution is general case using harmonic interpolation and greedy algorithms • Intents for all user types taken care of • Better solution exists : • In general case, randomized 485-approximation algorithm by Nikhil Bansal et. al. • Based on stricter LP relaxation • Randomized rounding