1 / 32

Quicklink Selection for Navigational Query Results

Quicklink Selection for Navigational Query Results. Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com). What are quicklinks. Result Website. Quicklinks.

rvincent
Download Presentation

Quicklink Selection for Navigational Query Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

  2. What are quicklinks Result Website Quicklinks

  3. Quicklinks Result Website • Quicklinks = URLs within the search result website • Enable fast navigation to important parts of the website • Which URLs should be QLs? Quicklinks

  4. Quicklink Selection • Some obvious strategies don’t work very well • Top clicked URLs in search engine • URL may have low relevance in the QL context • lib.utexas.edu/maps is popular for searches on “maps” and not for searches on “Univ. of Texas” • URL may be too specific: • automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com • URL popularity be time sensitive: • nytimes.com/election-guide/2008/ for nytimes.com

  5. Quicklink Selection • Some obvious strategies don’t work very well • Top clicked URLs in search engine • Top visited URLs intoolbar data • May not relate to search activity:e.g., for nytimes.com • #3 is nytimes.com/mem/emailthis.html • #6 isnytimes.com/auth/login • #8 isnytimes.com/gst/regi.html

  6. Quicklink Selection • Some obvious strategies don’t work very well • Top clicked URLs in search engine • Top visited URLs in toolbar data • Top URLs from analysis of hyperlink graph • Ignores preferences of search users • Toolbar data is more representative • Heavily tagged URLs (e.g., del.icio.us/digg) • Low coverage: Too few websites

  7. Quicklink Selection • Need a combined approach • Search logs • Toolbar data • Web-server logs • Website hyperlink graph • User tags This paper

  8. Related Work • Sitemap generation [Perkowitz+/00] • Detection of hard-to-find URLs [Srikant+/01] • Improving website navigability [Doerr+/07] • Mining Web usage patterns [Buchner/99, Cadez+/03] • BrowseRank [Liu+/08] • Post-search browsing behavior [Bilenko+/08] We focus on QLs in the context of Search

  9. Outline • Motivation and Related Work • Problem Formulation • Proposed Solution • Experiments • Conclusions

  10. Problem Formulation • Which k URLs should be QLs? “The greatest good for the greatest number” • QLs save clicks • Maximize the total number of clicks saved using at most k QLs • But when exactly is a click “saved”?

  11. Problem Formulation • When does a QL get clicked by the user? Say we pick this node as a QL nasa.gov Hubble telescope Photos Graph of click trails (Toolbar data)

  12. Problem Formulation Say we pick this node as a QL nasa.gov Hubble telescope Photos Graph of click trails (Toolbar data) Assumption:The user recognizes if SearchResult  QL  Destination

  13. Problem Formulation nasa.gov (saves 1 click each) Say we pick this node as a QL Graph of click trails (Toolbar data) Assumption:The user recognizes if SearchResult  QL  Destination

  14. Problem Formulation nasa.gov (saves 1 click each) (saves 0) Say we pick this node as a QL (saves 0) (saves 2 clicks each) Total savings = 1*3 + 2*2 = 7 clicks Graph of click trails (Toolbar data) Assumption:The user recognizes if SearchResult  QL  Destination

  15. Problem Formulation • However… • Unknown pages might become QLs lyrics.com These could become the “best” QLs … A B C Z

  16. Problem Formulation • However… • Unknown pages might become QLs • Automatic-redirect pages might become QLs: • nytimes.com forces logging in • aaa.com forces zipcode entry • We need QLs that are “noticeable” in a search context

  17. Problem Formulation • How can we estimate noticeability? • Via Search click-logs • Noticeability of a URL u: • User notices a useful QL with probability α(u) Tuning param(≈ 2) Fraction of search clicks for u on website

  18. Problem Formulation nasa.gov # trailprob#clicks saves 2 x α1 x 2 saves 1 x α1 x 1 saves 2 x (1-α2)α1 x 1 saves 2 x α2 x 2 Total = 5α1 + 4α2 + 2(1-α1)α2 ? (saves 0) QL1 (saves 0) QL2 Assumption:The user picks the best QL that he/she notices

  19. Problem Formulation nasa.gov # trailprob#clicks saves 2 x α1 x 2 saves 1 x α1 x 1 saves 2 x (1-α2)α1 x 1 saves 2 x α2 x 2 Total = 5α1 + 4α2 + 2(1-α1)α2 (saves 0) QL1 (saves 0) QL2 If only QL1 is perfectly noticeable (α1=1, α2=0): Total = 7 clicks (as if 1 QL only) If both QLs are perfectly noticeable (α1=1, α2=1): Total = 9 clicks

  20. Problem Formulation • Which k URLs should be QLs? • Maximize the expected number of clicks saved using at most k QLs • while incorporating “noticeability”

  21. Outline • Motivation and Related Work • Problem Formulation • Proposed Solution • Experiments • Conclusions

  22. Algorithms • Maximize expected number of saved clicks using k QLs  NP-Hard • Theorem: This objective is non-decreasing submodular • Non-negative • Adding QLs never hurts • “Diminishing Returns” u Marginal improvement to superset S’ Marginal improvement to set S

  23. Algorithms • Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most • Within a factor (1-1/e) of OPT[Nemhauser+/’78]

  24. Algorithms • However… • Inhomogeneous results: QLs for ea.com are • fifa08.ea.com • battlefield.ea.com • 6 webpages deep inside thesim2.ea.com • Redundant results: QLs for senate.gov include • obama.senate.gov • obama.senate.gov/about • obama.senate.gov/contact • obama.senate.gov/votes Two games made by EA Parent URL makes the child URLs redundant

  25. Algorithms • Both can be specified as pairwise constraints on URLs allowed to belong to a QL set • Pairwise-constrained QL selection isNP-hard. • Two-step process: • Heuristically find a large subset of trails that form a tree • Enforce constraints on tree • Dynamic program  optimal on tree

  26. Outline • Motivation and Related Work • Problem Formulation • Proposed Solution • Experiments • Conclusions

  27. Experiments • Baseline Methods • TopClicked: • URL score = # search clicks on URL • TopVisited: • URL score = # occurrences on toolbar trails • PageRank: • Build a weighted graph on URLs, where weight(i,j) = # trails using the ij edge • URL score = PageRank on this graph

  28. Experiments • Live Traffic dataset • Computed CTRs on QLs currently displayed by Yahoo! (1043 website subset) • Measure: • Pick two equal-sizes subsets of QLs • Use sum-of-scores and sum-of-CTRs to predict the better subset • Measure how often the predictions match

  29. Live Traffic Data Experiments Fraction of subset-pairs where predictions agree with live traffic Subset sizes QL-ALG > TopVisited > PageRank > TopClicked

  30. Experiments 100 80 • Tree-structured trails • Most dropped trails are very short • Tree-structured trails improve accuracy 60 Number of trails dropped 40 20 0 1 10 100 1000 10000 Length of trail Distribution of dropped trails Live Traffic prediction quality comparison

  31. Outline • Motivation and Related Work • Problem Formulation • Proposed Solution • Experiments • Conclusions

  32. Conclusions • Proposed a formulation for the QL selection problem • Both toolbar and search logs are used intuitively • Proposed two algorithms: • Greedy: (1-1/e)-optimal • Tree-structured: empirically better • Improvement of 22% over competing baselines

More Related