240 likes | 410 Views
Main Points. Necessity is as important as idf (theory) Explains behavior of IR models (practice) Can be predicted Performance gain. Term Necessity Prediction P( t | R q ). Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University
E N D
Main Points • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • Can be predicted • Performance gain Term Necessity PredictionP(t | Rq) Le Zhao and Jamie Callan Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University Oct 27, CIKM 2010
Definition of Necessity P(t | Rq) • Collection Directly calculated given relevance judgements for q Relevant (q) Docs that contain t P(t | Rq) = 0.4 Necessity == term recall == 1 – mismatch
Why Necessity?Roots in Probabilistic Models • Main Points • Binary Independence Model • [Robertson and Spärck Jones 1976] • “Relevance Weight”, “Term Relevance” • P(t | R)is effectively the only part about relevance. • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • Can be predicted • Performance gain • idf (sufficiency) Necessity odds
Without Necessity • The emphasis problem for idf-only term weighting • Emphasize high idf terms in query • “prognosis/viability of a political third party in U.S.” (Topic 206)
Ground Truth TREC 4 topic 206 Emphasis
Indri Top Results 1. (ZF32-220-147) Recession concerns lead to a discouraging prognosis for 1991 2. (AP880317-0017) Politics … party … Robertson's viability as a candidate 3. (WSJ910703-0174) political parties … 4. (AP880512-0050) there is no viable opposition … 5. (WSJ910815-0072) A third of the votes 6. (WSJ900710-0129) politics, party, two thirds 7. (AP880729-0250) third ranking political movement… 8. (AP881111-0059) political parties 9. (AP880224-0265) prognosis for the Sunday school 10. (ZF32-051-072) third party provider (Google, Bing still have top 10 false positives. Emphasis also a problem for large search engines!)
Without Necessity • The emphasis problem for idf-only term weighting • Emphasize high idf terms in query • “prognosis/viability of a political third party in U.S.” (Topic 206) • False positives throughout rank list • especially detrimental at top rank • No term recall hurts precision at all recall levels • (This is true for BIM, and also BM25, LM that use tf.) • How significant is the emphasis problem?
Failure Analysis of 44 Topics from TREC 6-8 • Main Points • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • & Bigrams, &Term restriction using doc fields • Can be predicted • Performance gain Necessity term weighting Necessity guided expansion Basis: Term Necessity Prediction RIA workshop 2003 (7 top research IR systems, >56 expert*weeks)
Given True Necessity • +100% over BIM (in precision at all recall levels) • [Robertson and Spärk Jones 1976] • +30-80% over Language Model, BM25 (in MAP) • This work • For a new query w/o relevance judgements, need to predict necessity. • Predictions don’t need to be very accurate to show performance gain.
How Necessary are Words? (Examples from TREC 3 topics)
Mismatch Statistics • Mismatch variation across terms (TREC 3 title) (TREC 9 desc) • Not constant, need prediction
Mismatch Statistics (2) • Mismatch variation for the same term in different queries TREC 3 recurring words • Query dependent features needed(1/3 term occurrences have necessity variation>0.1)
Prior Prediction Approaches • Croft/Harper combination match (1979) • treats P(t | R)as a tuned constant • when >0.5, rewards docs that match more query terms • Greiff’s (1998) exploratory data analysis • Used idf to predict overall term weighting • Improved over BIM • Metzler’s (2008) generalized idf • Used idf to predict P(t | R) • Improved over BIM • Years of simple idf feature, limited success • Missing piece: P(t | R) = term necessity = term recall
Factors that Affect Necessity What causes a query term to not appear in relevant documents? • Topic Centrality (Concept Necessity) • E.g., Laser research related or potentially related to US defense, Welfare laws propounded as reforms • Synonyms • E.g., movie == film == … • Abstractness • E.g., Ailments in the vitamin query, Dog Maulings, Christian Fundamentalism • Worst thing is a rare & abstract term, e.g. prognosis
Features • We need to • Identify synonyms/searchonyms of a query term • in a query dependent way • Use Thesauri? • Biased (not collection dependent) • Static (not query dependent) • Not promising, Not easy • Term-term similarity in concept space! • Local LSI (Latent Semantic Indexing) • LSI of (e.g. 200) top ranked documents • keep (e.g. 150) dimensions
Features • Topic Centrality • Length of term vector after dimension reduction (local LSI) • Synonymy (Concept Necessity) • Average similarity scores of top 5 similar terms • Replaceability • Adjust the Synonymy measure by how many new documents the synonyms match • Abstractness • Users modify abstract terms with concrete terms effects on the US educational program prognosis of a political third party
Experiments • Necessity Prediction Error • Regression problem • Model: RBF kernel regression, M: <f1, f2, .., f5> P(t | R) • Necessity for Term Weighting • End-to-End retrieval performance • How to weight terms by their necessity • In BM25 • Binary Independence Model • In Language Models • Relevance model Pm(t| R) – multinomial (Lavrenko and Croft 2001)
Necessity Prediction Example Trained on TREC 3, tested on TREC 4 Emphasis
Necessity Prediction Error • Main Points The lower The better • Necessity is as important as idf • Explains behavior of IR models • Can be predicted • Performance gain L1 Loss:
Predicted Necessity Weighting 10-25% gain (necessity weight) 10-20% gain(top Precision)
Predicted Necessity Weighting (ctd.) • Main Points • Necessity is as important as idf • Explains behavior of IR models • Can be predicted • Performance gain
vs. Relevance Model x ~ y w1 ~ P(t1|R) w2 ~ P(t2|R) Relevance Model: #weight( 1-λ #combine( t1t2 ) λ #weight( w1t1 w2t2 w3t3 … ) ) y x Weight Only ≈ Expansion Supervised > Unsupervised (5-10%)
Take Home Messages • Necessity is as important as idf (theory) • Explains behavior of IR models (practice) • Effective features can predict necessity • Performance gain
Acknowledgements • Reviewers from multiple venues • Ni Lao, Frank Lin, Yiming Yang, Stephen Robertson, Bruce Croft, Matthew Lease • Discussions & references • David Fisher, Mark Hoy • Maintaining the Lemur toolkit • Andrea Bastoni and Lorenzo Clemente • Maintaining LSI code for Lemur toolkit • SVM-light, Stanford parser • TREC • All the data • NSF Grant IIS-0707801 and IIS-0534345 Feedback: Le Zhao (lezhao@cs.cmu.edu)