310 likes | 398 Views
Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results. Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3. 1 Carnegie Mellon 2 UCLA 3 IIT Bombay. --------- --------- ---------.
E N D
Shuffling a Stacked DeckThe Case for Partially Randomized Ranking of Search Engine Results Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay
--------- • --------- • --------- Popularity as a Surrogate for Quality • Search engines want to measure the “quality” of pages • Quality is hard to define and measure • Various “popularity” measures are used in ranking • e.g., in-links, PageRank, usertraffic
Relationship Between Popularity and Quality aware of page p • Popularity : depends on the number of users who “like” a page • relies on both quality and awareness of the page Users like page p • Popularity is different from quality • But strongly correlated when awareness is large
Problem • Popularity/quality correlation weak for young pages • Even if of high quality, may not (yet) be popular due to lack of user awareness • Plus, process of gaining popularity inhibited by “entrenchment effect” • [Cho et. al. WWW’04], [Chakrabarti et. al. SODA’05] [Mowshowitz et. al. Communication’02] and many others
--------- • --------- • --------- • --------- • --------- • --------- … user attention entrenched pages Entrenchment Effect • Search engines show entrenched (already-popular) pages at the top • Users discover pages via search engines; tend to focus on top results new unpopular pages
Outline • Problem introduction • Key idea: Mitigate entrenchment by introducing randomness into ranking • Randomized Rank Promotion Scheme • Model of ranking and popularity evolution • Evaluation • Summary
Alternative Approaches to Counter-act Entrenchment Effect • Weight links to young pages more • [Baeza-Yates et. al SPIRE ’02] • Proposed an age-based variant of PageRank • Extrapolate quality based on increase in popularity • [Cho et. al SIGMOD ’05] • Proposed an estimate of quality based on the derivative of popularity
1 1 500 2 2 3 . . . 3 . 500 499 501 501 Our Approach: Randomized Rank Promotion • Select random (young) pages to promote to good rank positions • Rank position to promote to is chosen at random
Our Approach: Randomized Rank Promotion • Consequence: Users visit promoted pages; improves ability to estimate quality via popularity • Compared with previous approaches: • Does not rely on temporal measurements (+) • Sub-optimal (-)
Exploration/Exploitation Tradeoff • Exploration/Exploitation tradeoff • exploit known high-quality pages by assigning good rank positions • explore quality of new pages by promoting them in rank • Existing search engines only exploit (to our knowledge)
Possible Objectives for Rank Promotion • Fairness • Give each page an equal chance to become popular • Incentive for search engines to be fair? • Quality • Maximize quality of search results seen by users (in aggregate) • Quality page p: extent to which users “like” p • Q(p) [0,1] our choice
Quality-Per-Click Metric (QPC) • V(p,t):number of visits made to page p at time t through search engine • QPC : average quality of pages viewed by users, amortized over time
Outline • Problem introduction • Key idea: Mitigate entrenchment by introducing randomness into ranking • Randomized Rank Promotion Scheme • Model of ranking and popularity evolution • Evaluation • Summary
1 1 500 2 2 3 . . . 3 . 500 499 501 501 Desiderata for Randomized Rank Promotion Want ability to: • Control exploration/exploitation tradeoff • “Select” certain pages as candidates for promotion • “Protect’’ certain pages from demotion
1 2 W 3 4 1 2 3 4 Randomized Rank Promotion Scheme Promotion pool Wm random ordering Remainder W-Wm Lm order by popularity Ld
1-r r k-1 Randomized Rank Promotion Scheme Promotion list Remainder 1 2 1 2 4 3 Ld Lm 1 2 3 4 5 6 k = 3 r = 0.5
Parameters • Promotion pool(Wm) • Uniform rank promotion: give an equal chance to each page • Selective rank promotion: exclusively target zero awareness pages • Start rank (k) • rank to start randomization from • Degree of randomization (r) • controls the tradeoff between exploration and exploitation
Tuning the Parameters • Objective: maximize quality-per-click (QPC) • Two ways to tune • Real-world experiment • Analytical modeling
Outline • Problem introduction • Key idea: Mitigate entrenchment by introducing randomness into ranking • Randomized Rank Promotion Scheme • Model of ranking and popularity evolution • Evaluation • Summary
Popularity Evolution Cycle Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t)
Popularity Evolution Cycle FPR(P(p,t)) FAP(A(p,t)) Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t) FRV(R(p,t)) FVA(V(p,t))
Next step : derive formula for popularity evolution curve Popularity P(p,t) time (t) Deriving Popularity Evolution Curve • Assumptions • Number of pages constant • Pages are created and retired according to a Poisson process with rate parameter • Quality distribution of pages is stationary
DETAIL Deriving Popularity Evolution Curve Doing the steady state analysis, we get
Use Popularity Evolution Model to Tune Parameters • Model of popularity evolution process(see paper) • Complex dynamic process • To study, we combine approximate analysis with simulation • Next step:use model to tune rank promotion scheme • Parameters: k, r and Wm • Objective: maximize QPC
Tuning: Promotion Pool (Wm ) • -no promotion • - uniform promotion • selective promotion k=1 and r=0.2
Tuning: k and r k: start rank r: degree of randomization
Tuning: k and r MaximizeQPC (Quality-per-click) Avoid excessive “junk” Preserve #1 result for navigational searches
Linux Squash Model of the Web • Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.) • A community is made up of a set of pages, interested users and related queries
Summary • Entrenchment effect hurts search result quality • Solution :Randomized rank promotion • Model of Web evolution and QPC metric • Used to tune & evaluate randomized rank promotion • Results : • New high-quality pages become popular much faster • Aggregate search result quality significantly improved
THE END • Paper available at : www.cs.cmu.edu/~spandey