1 / 31

Kevin C. Chang

Kevin C. Chang. About the collaboration -- Cazoodle. Coming next week: Vacation Rental Search. How do you greet people in your culture?. What have you been searching lately?. What have you been searching lately?. The university and areas of Kevin Chang? The email of Marc Snir?

Download Presentation

Kevin C. Chang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kevin C. Chang

  2. About the collaboration -- Cazoodle Coming next week: Vacation Rental Search

  3. How do you greet people in your culture? What have you been searching lately?

  4. What have you been searching lately? • The university and areas of Kevin Chang? • The email of Marc Snir? • Customer service phone number of Amazon? • What profs are doing databases at UIUC? • The papers and presentations of SIGMOD 2007? • Due date of SIGMOD 2008? • Sale price of “Canon PowerShot A400”? • “Hamlet” books available at bookstores?

  5. Huge Supermarket! The Web is a Big Library.

  6. Queries can be anythings, too! Search Engine

  7. Are there certain “regularities” to exploit?

  8. Let’s try out…

  9. Survey 1: How likely does a query follow a pattern? 9 out of 10 samples share a pattern with others!

  10. Survey 2: How likely do queries in a domain follow patterns w.r.t. pre-specified attributes? Over 28,000 manually labeled queries: Some domains have as high as 90+% patterned queries.

  11. Survey 3: How many patterns are there? Hundreds of patterns needed to cover 80% queries.

  12. Simple concept: What is Query Template? • (this paper) Sequence of keywords and attributes • #celebrity affairs • #category jobs in #location • #movie showtimes in #zipcode • … • (In general) Patterns that can be induced from queries • e.g., regular expressions.

  13. How would such templates be useful?

  14. We advocate Rich Query Interpretation. t = “#category jobs in #location” for Job q = “accounting jobs in chicago” By matching query q to template t: • 1) Intent Classifier: recognize intended domain. qÎJob • 2) Query Parser: recognize associated attributes.  #category = “accounting”, #location = “chicago”

  15. Rich query interpretation is useful. Tailored responses by query patterns: • Finding results directly  No longer 10 blue links. • Ranking results  Relevant to attributes desired. • Dispatching verticals  Bring verticals into search. • Matching ads  More likely to click.

  16. Query: Finding flights

  17. Query: Finding movie showtimes

  18. Query: Finding weather

  19. But much more patterns can be leveraged!

  20. Now, how to systematically discover such templates?

  21. Problem: Query Template Discovery • Given: • Query log L • e.g., we use MSN query log 2006. • Domain schema D • e.g., (#category, #location, #title) with vocabulary. • Incomplete schema can be handled, too. • Seed knowledge (queries, sites, templates, or mix) • E.g., 5 queries; or 2 sites; or 2 templates. • Output: “Good” templates T* = {t1, t2, …} • t1 = #location jobs • t2 = #location #category positions • ……..

  22. Step 1: Define quality metrics.

  23. How to measure quality of templates? • Some templates are more “popular.” • “#city1 #city2”, “#make #model” • Some templates are more “accurate.” • “#city1 #city2 flights”, “#location #make used cars” F Precision: Recall:

  24. Step 2: From seeds, infer templates with good quality.

  25. 1) Can P and R be “inferred”? (or, estimated.) • Probabilistic Recall: • Probabilistic Precision:

  26. t1 q1 s1 s2 q7 q6 q5 q4 q2 t4 t3 t2 s3 q3 2) What relationships can we use to infer? Log QST “Quest” Graph Queries Q q1: jobs in chicago q2: jobs in boston q3: jobs in microsoft q4: jobs in motorola q5: marketing jobs in motorola q6: 401k plans q7: illinois employment statistics 1 10 1 2 5 4 1 1 1 12 Sites S s1: monster.com s2: motorola.com s3: us401k.com 4 1 Templates T t1: jobs in #location t2: jobs in #company t3: #category jobs in #company t4: #location employment statistics 2 1 4

  27. 3) How to infer on this graph? Duality of Random Walk: When we walk back and forth, we are inferring precision and recall, respectively. • R(t) is forward random walk from seeds. • P(t) is backward random walk to seeds.

  28. D R0(x) Recall is forward random walk from seeds. x Iq Iqt q t F Recall is just like (personalized) PageRank.

  29. D P0(x) Precision is backward random walk to seeds. x It Iqt q t F Precision is harmonic energy minimization.

  30. Experimental results • Quest is effective in finding templates by inferred P and R, achieving 90% on actual F-measures. • Top results:

  31. And they did the real work… Ganesh Agarwal Govind Kabra Thank You!

More Related