1 / 31

Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results. Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3. 1 Carnegie Mellon 2 UCLA 3 IIT Bombay. --------- --------- ---------.

troy-david
Download Presentation

Sandeep Pandey 1 , Sourashis Roy 2 , Christopher Olston 1 , Junghoo Cho 2 , Soumen Chakrabarti 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shuffling a Stacked DeckThe Case for Partially Randomized Ranking of Search Engine Results Sandeep Pandey1, Sourashis Roy2, Christopher Olston1, Junghoo Cho2, Soumen Chakrabarti3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay

  2. --------- • --------- • --------- Popularity as a Surrogate for Quality • Search engines want to measure the “quality” of pages • Quality is hard to define and measure • Various “popularity” measures are used in ranking • e.g., in-links, PageRank, usertraffic

  3. Relationship Between Popularity and Quality aware of page p • Popularity : depends on the number of users who “like” a page • relies on both quality and awareness of the page Users like page p • Popularity is different from quality • But strongly correlated when awareness is large

  4. Problem • Popularity/quality correlation weak for young pages • Even if of high quality, may not (yet) be popular due to lack of user awareness • Plus, process of gaining popularity inhibited by “entrenchment effect” • [Cho et. al. WWW’04], [Chakrabarti et. al. SODA’05] [Mowshowitz et. al. Communication’02] and many others

  5. --------- • --------- • --------- • --------- • --------- • --------- … user attention entrenched pages Entrenchment Effect • Search engines show entrenched (already-popular) pages at the top • Users discover pages via search engines; tend to focus on top results new unpopular pages

  6. Outline • Problem introduction • Key idea: Mitigate entrenchment by introducing randomness into ranking • Randomized Rank Promotion Scheme • Model of ranking and popularity evolution • Evaluation • Summary

  7. Alternative Approaches to Counter-act Entrenchment Effect • Weight links to young pages more • [Baeza-Yates et. al SPIRE ’02] • Proposed an age-based variant of PageRank • Extrapolate quality based on increase in popularity • [Cho et. al SIGMOD ’05] • Proposed an estimate of quality based on the derivative of popularity

  8. 1 1 500 2 2 3 . . . 3 . 500 499 501 501 Our Approach: Randomized Rank Promotion • Select random (young) pages to promote to good rank positions • Rank position to promote to is chosen at random

  9. Our Approach: Randomized Rank Promotion • Consequence: Users visit promoted pages; improves ability to estimate quality via popularity • Compared with previous approaches: • Does not rely on temporal measurements (+) • Sub-optimal (-)

  10. Exploration/Exploitation Tradeoff • Exploration/Exploitation tradeoff • exploit known high-quality pages by assigning good rank positions • explore quality of new pages by promoting them in rank • Existing search engines only exploit (to our knowledge)

  11. Possible Objectives for Rank Promotion • Fairness • Give each page an equal chance to become popular • Incentive for search engines to be fair? • Quality • Maximize quality of search results seen by users (in aggregate) • Quality page p: extent to which users “like” p • Q(p) [0,1] our choice

  12. Quality-Per-Click Metric (QPC) • V(p,t):number of visits made to page p at time t through search engine • QPC : average quality of pages viewed by users, amortized over time

  13. Outline • Problem introduction • Key idea: Mitigate entrenchment by introducing randomness into ranking • Randomized Rank Promotion Scheme • Model of ranking and popularity evolution • Evaluation • Summary

  14. 1 1 500 2 2 3 . . . 3 . 500 499 501 501 Desiderata for Randomized Rank Promotion Want ability to: • Control exploration/exploitation tradeoff • “Select” certain pages as candidates for promotion • “Protect’’ certain pages from demotion

  15. 1 2 W 3 4 1 2 3 4 Randomized Rank Promotion Scheme Promotion pool Wm random ordering Remainder W-Wm Lm order by popularity Ld

  16. 1-r r k-1 Randomized Rank Promotion Scheme Promotion list Remainder 1 2 1 2 4 3 Ld Lm 1 2 3 4 5 6 k = 3 r = 0.5

  17. Parameters • Promotion pool(Wm) • Uniform rank promotion: give an equal chance to each page • Selective rank promotion: exclusively target zero awareness pages • Start rank (k) • rank to start randomization from • Degree of randomization (r) • controls the tradeoff between exploration and exploitation

  18. Tuning the Parameters • Objective: maximize quality-per-click (QPC) • Two ways to tune • Real-world experiment • Analytical modeling

  19. Outline • Problem introduction • Key idea: Mitigate entrenchment by introducing randomness into ranking • Randomized Rank Promotion Scheme • Model of ranking and popularity evolution • Evaluation • Summary

  20. Popularity Evolution Cycle Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t)

  21. Popularity Evolution Cycle FPR(P(p,t)) FAP(A(p,t)) Popularity P(p,t) Awareness A(p,t) Rank R(p,t) Visit rate V(p,t) FRV(R(p,t)) FVA(V(p,t))

  22. Next step : derive formula for popularity evolution curve Popularity P(p,t) time (t) Deriving Popularity Evolution Curve • Assumptions • Number of pages constant • Pages are created and retired according to a Poisson process with rate parameter • Quality distribution of pages is stationary

  23. DETAIL Deriving Popularity Evolution Curve Doing the steady state analysis, we get

  24. Use Popularity Evolution Model to Tune Parameters • Model of popularity evolution process(see paper) • Complex dynamic process • To study, we combine approximate analysis with simulation • Next step:use model to tune rank promotion scheme • Parameters: k, r and Wm • Objective: maximize QPC

  25. Tuning: Promotion Pool (Wm ) • -no promotion • - uniform promotion • selective promotion k=1 and r=0.2

  26. Tuning: k and r k: start rank r: degree of randomization

  27. Tuning: k and r MaximizeQPC (Quality-per-click) Avoid excessive “junk” Preserve #1 result for navigational searches

  28. Linux Squash Model of the Web • Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.) • A community is made up of a set of pages, interested users and related queries

  29. Robustness Across Different Web Communities

  30. Summary • Entrenchment effect hurts search result quality • Solution :Randomized rank promotion • Model of Web evolution and QPC metric • Used to tune & evaluate randomized rank promotion • Results : • New high-quality pages become popular much faster • Aggregate search result quality significantly improved

  31. THE END • Paper available at : www.cs.cmu.edu/~spandey

More Related