1 / 35

Tagging with Queries: How and Why?

Tagging with Queries: How and Why?. Ioannis Antonellis antonell@cs.stanford.edu Hector Garcia-Molina hector@cs.stanford.edu Jawed Karim jawed@cs.stanford.edu. Content on the Web. Back Link Text. Search queries. Page Text. Forward Link Text. Cnn Obama Critics news. How?.

irina
Download Presentation

Tagging with Queries: How and Why?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tagging with Queries: How and Why? Ioannis Antonellis antonell@cs.stanford.edu Hector Garcia-Molina hector@cs.stanford.edu Jawed Karim jawed@cs.stanford.edu

  2. Content on the Web Back Link Text Search queries Page Text Forward Link Text Cnn ObamaCriticsnews Stanford Infolab

  3. How? • Basic observation: http referrer field contains search query Stanford Infolab 3

  4. How? Stanford Infolab

  5. How? • Basic observation: http referrer field contains search query 1) Extract queries from web access log Stanford Infolab 5

  6. Web Access Log a997c1950718d75c03f22ca8715e50b3 [28/Feb/2007:23:45:47 -0800] /group/svsa/cgi-bin/www/officers.php http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=HPIB,HPIB:2006-47,HPIB:en&q=sexy+random+facts a64344ffd6638d0f6fb2a0284f98b28b [28/Feb/2007:23:45:49 -0800] /group/King/ "http://www.google.com.au/search?hl=en&q=Martin+Luther+King&meta=" 413fa663474b2288c1661882e7e62aea [28/Feb/2007:23:46:02 -0800] /group/pandegroup/folding/results.html "http://www.google.com/search?sourceid=navclient-menuext&ie=UTF-8&q=RESULTS" 3d2edd4dfa7778da92875ee67a319433 [28/Feb/2007:23:46:03 -0800] /group/vpge/sgsi/entrepreneurship/ "http://www.google.com/search?hl=en&q=summer+institute+of+entrepreneurship" ac49793239a6c490023e460fd4863a48 [28/Feb/2007:23:46:06 -0800] / "http://www.google.com/search?sourceid=navclient&hl=ko&ie=UTF-8&rlz=1T4SUNA_ko___KR209&q=stanford" 1c9893680 Stanford Infolab

  7. How? • Basic observation: http referrer field contains search query 1) Extract queries from web access log 2) Embed Javascript code in web pages that capture search queries Stanford Infolab 7

  8. Embeddable code Stanford Infolab 8

  9. How? • Basic observation: http referrer field contains search query 1) Extract queries from web access log 2) Embed Javascript code in web pages and capture search queries • Convince server administrator/page onwer Stanford Infolab 9

  10. Stanford Infolab 10

  11. Query tags Stanford Infolab 11

  12. Information value of Query Tags WebBase • Datasets: • Stanford Query Logs: 360,000 URLs, 900,000 query tags • Delicious@Stanford: 3,000 URLs, 5,500 tags Stanford Infolab 12

  13. Experiments - Summary • URLs coverage • Query vs Delicious Tags • Query/Delicious Tags vs Pagetext Stanford Infolab

  14. URLs coverage • Query logs provide tags for ~110 times more URLs than delicious • 13% of delicious URLs (380 URLs) only tagged by delicious Stanford Infolab 14

  15. Query Tags • Query logs provide 42 query tags per URL on average Stanford Infolab 15

  16. Delicious Tags • Delicious provides 3 tags per URL on average Stanford Infolab 16

  17. Tags for common URLs • Query logs provide 250 query tags per URL on average for common URLs • Delicious provides 5 tags per URL on average for common URLs Stanford Infolab 17

  18. Query Tags vs Page Text • For every URL, 1 out of 3 query tags are not present in the pagetext Stanford Infolab 18

  19. Delicious Tags vs Page Text • For every URL, 1 out of 2 query tags are not present in the pagetext Stanford Infolab 19

  20. Tags for common URLs • For common URLs, 1 out of 2 query/delicious tags not present in the pagetext Stanford Infolab 20

  21. Conclusions Query tags: Can be extracted in a distributed fashion new promising source of information can provide substantially many, new tags, for a large fraction of the Web Stanford Infolab 21

  22. Thank You! (DEMO) http://tags.stanford.edu Stanford Infolab 22

  23. Stanford Infolab 23

  24. Stanford Infolab 24

  25. Stanford Infolab 25

  26. Stanford Infolab 26

  27. Stanford Infolab 27

  28. Stanford Infolab 28

  29. Stanford Infolab 29

  30. Stanford Infolab 30

  31. Stanford Infolab 31

  32. Stanford Infolab 32

  33. How? Stanford Infolab 33

  34. Stanford Infolab 34

  35. Stanford Infolab 35

More Related