Automatic Term Mismatch Diagnosis for Selective Query Expansion

Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA @SIGIR 2012, Portland, OR

Main Points • An important problem – term mismatch & a traditional solution • New diagnostic intervention approach • Simulated user studies • Diagnosis & intervention effectiveness

Term Mismatch Problem • Average term mismatch rate: 30-40% [Zhao10] • A common cause of search failure [Harman03, Zhao10] • Frequent user frustration [Feild10] • Here: 50% - 300% gain in retrieval accuracy Relevant docs not returned Web, short queries, stemmed, inlinks included

Term Mismatch Problem Example query (TREC 2006 Legal discovery task): approval of (cigarette company) logos on television watched by children Highest mismatch rate High mismatch rate for all query terms in this query

The Traditional Solution: BooleanConjunctive Normal Form (CNF) Expansion Keyword query: approval of logos on television watched by children Manual CNF (TREC Legal track 2006): (approval OR guideline OR strategy)AND(logos OR promotionOR signage OR brand OR mascot ORmarque OR mark)AND(televisionOR TV OR cable OR network)AND(watched OR view OR viewer)AND(children OR child OR teen OR juvenile OR kid OR adolescent) • Expressive & compact (1 CNF == 100s alternatives) • Highly effective (this work: 50-300% over base keyword)

The Potential • Query: approval logos television watched children 50-300% The Potential • Recall • approval 6.49% • logos 14.1% • television 21.3% • watched 10.4% • children 18.0% • Overall 2.04% • Recall • +guideline+strategy== 12.8% • +promotion+signage...== 19.7% • +tv+cable+network == 22.4% • +view+viewer== 19.5% • +child+teen+kid... == 19.3% • == 8.74% ?

CNF Expansion • Widely used in practice • Librarians [Lancaster68, Harter86] • Lawyers [Lawlor62, Blair85, Baron07] • Search experts [Clarke95, Hearst96, Mitra98] • Less well studied in research • Users do not create effective free form Boolean queries ([Hearst09] cites many studies). • Question: How to guide user effort in productive directions • restricting to CNF expansion (to the mismatch problem) • focusing on problem terms when expanding WikiQuery [Open Source IR Workshop] Ad

Diagnostic Intervention Query:approval of logos on television watched by children • Goal • Least amount user effort  near optimal performance • E.g. expand 2 terms  90% of total improvement Low terms Diagnosis: High idf (rare) terms approval of logos on television watched bychildren approval of logos on television watched bychildren Expansion: CNF CNF (approvalORguidelineORstrategy)AND logosAND televisionAND (watch OR view OR viewer)ANDchildren (approvalORguidelineORstrategy)AND logos AND (televisionOR tvOR cable OR network) AND watchAND children

Diagnostic Intervention Query:approval of logos on television watched by children • Goal • Least amount user effort  near optimal performance • E.g. expand 2 terms  90% of total improvement Low terms Diagnosis: High idf (rare) terms approval of logos on television watched bychildren approval of logos on television watched bychildren Expansion: Bag of word Bag of word Original query [ 0.9 (approval logos television watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 view0.4 viewer)] [ 0.9 (approval cigartelevision watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 tv 0.4 cable0.2 network) ] Expansion query

Diagnostic Intervention • Diagnosis methods • Baseline: rareness (high idf) • High predicted term mismatch P(t| R) [Zhao10] • Intervention methods • Baseline: bag of word (Relevance Model [Lavrenko01]) • w/ manual expansion terms • w/ automatic expansion terms • CNF expansion (probabilistic Boolean ranking) • E.g. _ (approval OR guideline OR strategy)ANDPlogos ANDPtelevisionANDP(watch OR view OR viewer)ANDPchildren

Main Points • An important problem – term mismatch & a traditional solution • New diagnostic intervention approach • Evaluation: Simulated user studies • Diagnosis & intervention effectiveness

Diagnostic Intervention (We Hope to) (child AND cigar) (child  teen) (child > cigar) (child OR teen) AND cigar

We Ended up Using Simulation Online simulation FullCNFOffline (child OR teen) AND (cigar OR tobacco) (child AND cigar) Online simulation (child  teen) (child > cigar) (child OR teen) AND cigar

Diagnostic Intervention Datasets • Document sets • TREC 2007 Legal track, 7 million tobacco corp., train on 2006 • TREC 4 Ad hoc track, 0.5 million newswire, train on TREC 3 • CNF Queries • TREC 2007 by lawyers, TREC 4 by Univ. Waterloo [Clarke95] • 50 topics each, 2-3 keywords per query • Relevance Judgments • TREC 2007 sparse, TREC 4 dense • Evaluation measures • TREC 2007 statAP, TREC 4 MAP

Results – Diagnosis P(t| R) vs. idf diagnosis Full Expansion 8%-50% No Expansion Diagnostic CNF expansion on TREC 4 and 2007

Results – Expansion Intervention CNF vs. bag-of-word expansion Similar level of gain in top precision 50% to 300% gain P(t | R) guided expansion on TREC 4 and 2007

Conclusions • One of the most effective ways to engage user interactions • CNF queries gain 50-300% over keyword baseline. • Mismatch diagnosis  simple & effective interactions • Automatic diagnosis saves user effort by 33%. • Expansion in CNF easier and better than in bag of word • Bag of word requires balanced expansion of all terms. • New research questions: • How to learn from manual CNF queries to improve automatic CNF expansion • get ordinary users to create effective CNF expansions (with the help of interfaces or search tools)

Acknowledgements ChengtaoWen, Grace Hui Yang, Jin Young Kim, Charlie Clarke, SIGIR Reviewers Helpful discussions & feedback Charlie Clarke, Gordon Cormack, Ellen Voorhees, NIST Access to data NSF grant IIS-1018317 Opinions are solely the authors’.

END

The Potential • Query: approval logos television watched children 50-300% The Potential • Recall Recall • logos 14.1% +promotion+signage...== 19.7% • approval 6.49% +guideline+strategy== 12.8% • television 21.3% +tv+cable+network == 22.4% • watched 10.4% +view+viewer== 19.5% • children 18.0% +child+teen+kid... == 19.3% • Overall 2.04% == 8.74% ?

Failure Analysis (vs. baseline) Diagnosis: • 4 topics: wrong P(t | R) prediction, lower MAP Intervention: • 3 topics: right diagnosis, but lower MAP • 2 of the 3: no manual expansion for the selected term • Users do not always recognize which terms need help. • 1 of the 3: wrong expansion terms by expert • “apatite rocks” in nature, not “apatite” chemical • CNF expansion can be difficult w/o looking at retrieval results.

Failure Analysis -- Comparing diagnosis methods: P(t | R) vs. idf Legend query query with unexpanded term(s) User didn’t expand Wrong expansion

Term Mismatch Diagnosis • Predicting term recall - P(t| R) [Zhao10] • Query dependent features (model causes of mismatch) • Synonyms of term t based on query q’s context • How likely these synonyms occur in place of t • Whether t is an abstract term • How rare t occurs in the collection C • Regression prediction: fi(t, q, C)  P(t | R) • Used in term weighting for long queries • Lower predicted P(t | R)  higher likelihood of mismatch t more problematic

Online or Offline Study? • Controlling confounding variables • Quality of expansion terms • User’s prior knowledge of the topic • Interaction effectiveness & effort • Enrolling many users • Offline simulations can avoid all these and still make reasonable observations

Simulation Assumptions • Full expansion to simulate partial expansions • 3 assumptions about user expansion process • Independent expansion of query terms • A1: same set of expansion terms for a given query term, no matter which subset of query terms gets expanded • A2: same sequence of expansion terms, no matter … • A3: Re-constructing keyword query from CNF • Procedure to ensure vocabulary faithful to that of the original keyword description • Highly effective CNF queries ensure reasonable kw baseline

Results – Level of Expansion • More expansion per query term, better retrieval • Result of expansion terms being effective • Queries with significant gain in retrieval after expanding more than 4 terms: • Topic 84, cigarette sales in James Bond movies

Most infrequent Online simulation Online simulation (child OR youth) AND (cigar OR tobacco) (child AND cigar) (child > cigar) OfflineFull CNF Query (child OR youth) AND cigar (child --> youth)

Automatic Term Mismatch Diagnosis for Selective Query Expansion

Automatic Term Mismatch Diagnosis for Selective Query Expansion

Presentation Transcript

Lecture 8 Query Expansion

Relevance Feedback and Query Expansion

Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval

Query expansion techniques

Pathdiag: Automatic TCP Diagnosis

Similarity Measures for Query Expansion in TopX

Modeling and Solving Term Mismatch for Full-Text Retrieval

Personalized Query Expansion for the Web

Personalized Query Expansion for the Web

Query Expansion

Information Retrieval - Query expansion

Query Expansion

Information Retrieval - Query expansion

Query Expansion

Automatic Query Expansion in Information Retrieval

Modeling and Solving Term Mismatch for Full-Text Retrieval

Similarity Measures for Query Expansion in TopX

Query Expansion

Lecture 10: Query expansion

Information Retrieval - Query expansion

Lecture 9: Query Expansion