Kevin C. Chang

Kevin C. Chang

About the collaboration -- Cazoodle Coming next week: Vacation Rental Search

How do you greet people in your culture? What have you been searching lately?

What have you been searching lately? • The university and areas of Kevin Chang? • The email of Marc Snir? • Customer service phone number of Amazon? • What profs are doing databases at UIUC? • The papers and presentations of SIGMOD 2007? • Due date of SIGMOD 2008? • Sale price of “Canon PowerShot A400”? • “Hamlet” books available at bookstores?

Huge Supermarket! The Web is a Big Library.

Queries can be anythings, too! Search Engine

Are there certain “regularities” to exploit?

Let’s try out…

Survey 1: How likely does a query follow a pattern? 9 out of 10 samples share a pattern with others!

Survey 2: How likely do queries in a domain follow patterns w.r.t. pre-specified attributes? Over 28,000 manually labeled queries: Some domains have as high as 90+% patterned queries.

Survey 3: How many patterns are there? Hundreds of patterns needed to cover 80% queries.

Simple concept: What is Query Template? • (this paper) Sequence of keywords and attributes • #celebrity affairs • #category jobs in #location • #movie showtimes in #zipcode • … • (In general) Patterns that can be induced from queries • e.g., regular expressions.

How would such templates be useful?

We advocate Rich Query Interpretation. t = “#category jobs in #location” for Job q = “accounting jobs in chicago” By matching query q to template t: • 1) Intent Classifier: recognize intended domain. qÎJob • 2) Query Parser: recognize associated attributes.  #category = “accounting”, #location = “chicago”

Rich query interpretation is useful. Tailored responses by query patterns: • Finding results directly  No longer 10 blue links. • Ranking results  Relevant to attributes desired. • Dispatching verticals  Bring verticals into search. • Matching ads  More likely to click.

Query: Finding flights

Query: Finding movie showtimes

Query: Finding weather

But much more patterns can be leveraged!

Now, how to systematically discover such templates?

Problem: Query Template Discovery • Given: • Query log L • e.g., we use MSN query log 2006. • Domain schema D • e.g., (#category, #location, #title) with vocabulary. • Incomplete schema can be handled, too. • Seed knowledge (queries, sites, templates, or mix) • E.g., 5 queries; or 2 sites; or 2 templates. • Output: “Good” templates T* = {t1, t2, …} • t1 = #location jobs • t2 = #location #category positions • ……..

Step 1: Define quality metrics.

How to measure quality of templates? • Some templates are more “popular.” • “#city1 #city2”, “#make #model” • Some templates are more “accurate.” • “#city1 #city2 flights”, “#location #make used cars” F Precision: Recall:

Step 2: From seeds, infer templates with good quality.

1) Can P and R be “inferred”? (or, estimated.) • Probabilistic Recall: • Probabilistic Precision:

t1 q1 s1 s2 q7 q6 q5 q4 q2 t4 t3 t2 s3 q3 2) What relationships can we use to infer? Log QST “Quest” Graph Queries Q q1: jobs in chicago q2: jobs in boston q3: jobs in microsoft q4: jobs in motorola q5: marketing jobs in motorola q6: 401k plans q7: illinois employment statistics 1 10 1 2 5 4 1 1 1 12 Sites S s1: monster.com s2: motorola.com s3: us401k.com 4 1 Templates T t1: jobs in #location t2: jobs in #company t3: #category jobs in #company t4: #location employment statistics 2 1 4

3) How to infer on this graph? Duality of Random Walk: When we walk back and forth, we are inferring precision and recall, respectively. • R(t) is forward random walk from seeds. • P(t) is backward random walk to seeds.

D R0(x) Recall is forward random walk from seeds. x Iq Iqt q t F Recall is just like (personalized) PageRank.

D P0(x) Precision is backward random walk to seeds. x It Iqt q t F Precision is harmonic energy minimization.

Experimental results • Quest is effective in finding templates by inferred P and R, achieving 90% on actual F-measures. • Top results:

And they did the real work… Ganesh Agarwal Govind Kabra Thank You!

Kevin C. Chang

Kevin C. Chang

Presentation Transcript

Thesis Topics Prof. C.-P. Chang

Chengkai Li Kevin-Chen-Chuan Chang Ihab Ilyas Sumin Song

A/P Kevin Chang

Sophomore C 00121314 Candice Hu 00121356 Mina Chang

Jose C. Renteria Kevin Lind

Kevin Kovach, DrPH(c), MSc, CHES

Chin-Chih Chang chang@cs.twsu

Chin-Chih Chang chang@cs.twsu

Chin-Chih Chang chang@cs.twsu

S. C. Liew # , P. Chen, B. Saengtuksin, C. W. Chang

KEVIN EAGAN, JUAN GARIBAY, MICHAEL SOH, SYLVIA HURTADO, MITCHELL CHANG

Kevin C. Chang Joint work with : Bin He, Zhen Zhang

C. C. Chang / Asian Journal of Health and Information Sciences,

Sylvia Hurtado, June C. Chang, Mitch J. Chang UCLA Higher Education Research Institute

Chin-Chih Chang chang@cs.twsu

KEVIN EAGAN, JUAN GARIBAY, MICHAEL SOH, SYLVIA HURTADO, MITCHELL CHANG

M.C. Chang

M.C. Chang

Chang Jiang

Sylvia Hurtado, June C. Chang, Mitch J. Chang UCLA Higher Education Research Institute