1 / 23

Relevance Information: A Loss of Entropy but a Gain for IDF?

Relevance Information: A Loss of Entropy but a Gain for IDF?. Arjen P. de Vries arjen@acm.org Thomas Roelleke, thor@dcs.qmul.ac.uk. Motivation. How should relevance information be incorporated in systems using TF*IDF term weighting?

serge
Download Presentation

Relevance Information: A Loss of Entropy but a Gain for IDF?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevance Information:A Loss of Entropy but a Gain for IDF? Arjen P. de Vries arjen@acm.org Thomas Roelleke, thor@dcs.qmul.ac.uk SIGIR 2005

  2. Motivation • How should relevance information be incorporated in systems using TF*IDF term weighting? • TF*IDF combines frequent occurrence with term discriminative-ness • Adding relevance information to a retrieval system corresponds to a loss of entropy; how does this affect IDF (the measure of term discriminativeness)?

  3. Overview • PART I • IDF, relevance, and BIR • PART II • Alternative estimation for IDF

  4. IDF • A robust summary statistic of term occurrence, that helps identify ‘good’ terms • Follows naturally from the Binary Independence Retrieval model (BIR) • The ranking that results from the situation without relevance information • Related to the occurrence probability P(t|C)

  5. Binary term presence/absence BIR • Rank documents by their probability of relevance • Using odds of relevance avoids estimation of some terms without affecting the ranking

  6. BIR Without relevance information, h(t,C) = – log n/(N – n) (Almost)the Discriminative-ness of term t in collection C!

  7. BIR and IDF • View IDF as term statistic in a set of documents, R or ¬R • Then, the BIR probability estimation known as F4 corresponds to IDF(t,¬R) – IDF(t,R) + IDF(¬t,R) – IDF(¬t,¬R) • IDF(t,R) can be interpreted as the discriminativeness of term presence among the relevant documents, etc.

  8. BIR and IDF • In practice, the ‘complement method’ gives ¬R = C\R ≈ C, so, usually, updating IDF under relevance information corresponds to subtracting IDF(t,R)! • The BIR modifies h(t,C) more significantly for those terms that are rare in the relevant set; for, they do not help identify good documents

  9. Implication for TF*IDF systems • A system using IDF(t,C) uses presence weighting only, assuming that the term t occurs in all relevant documents (such that IDF(t,R) = – log R/R = 0) • Systems using TF*IDF term weighting can incorporate RFB in accordance to the binary independence retrieval model

  10. Estimation IDF • Recall that IDF(t,C) = – log P(t|C), the occurrence probability of t in C. • Assuming events d are disjoint and exhaustivewe obtain P(t|C)=n/N • Q: Is this the best method for estimation? • Notice that, in the BIR formulation, sets R and ¬R have very different cardinality…

  11. Estimation TF • For TF weights, we know that P(t|d) estimated by a Poisson approximation (e.g., applied in BM25) or by lifting (e.g., applied in Inquery) leads to superior retrieval results • Motivation for this different estimate is to better handle the influence of varying document lengths

  12. Poisson Estimate • The ‘Poisson estimate’ approximates the (Poisson-based) probability that term t occurs at least once

  13. Poisson vs. Estimate vs. n/N

  14. Again, for small n

  15. |Poisson – Estimate|

  16. Experimental Setup • Ad-hoc retrieval • TREC-7 and TREC-8 (topics 351-400) • No stemming • Routing • LA Times articles for training (1989/1990) • Remainder for testing (1991-1994) • BM25 constants:

  17. Results: IDF vs. IDFp • IDF • IDFp

  18. IDF vs. IDFp • For the short T queries, the user selects carefully the most discriminative terms with respect to relevance • The longer TD and TDN queries contain however also noisy, non-discriminative terms

  19. IDF vs. IDFp • IDFp orders terms with respect to their discriminative-ness in the same order as IDF, but reduces the influence of the non-discriminative terms on the ranking • Differentiate more between rare terms, and less between frequent terms • As a result, the effect of the Poisson-based estimation is much stronger for the longer queries

  20. TF*IDF vs. TF*IDFp • Estimation with IDFp results in better mean average precision than the ‘traditional’ estimate • Strong emphasis on discriminative-ness (Poisson approximation IDFp using large values of K) improves effectiveness • Best overall performance for K=N/10

  21. Routing experiment • The TF*IDFp results without feedback are better than all TF*IDF results • But, the TF*IDFp results without feedback are also better than all TF*IDFp results with feedback • Finally, the TF*IDF results improve only marginally with feedback • LA times training data not representative?

  22. Conclusions • PART I • IDF and the Binary Independence Retrieval model are very closely related • Relevance information can be incorporated in TF*IDF by revising IDF • PART II • Different estimation of the occurrence probability in IDF leads to improved retrieval effectiveness

  23. Open Questions • Can we derive the choice for K=N/10 analytically? • Is the observed improvement in effectiveness really due to a better (frequentist) model of the occurrence probability, or is it a qualitative argument for informative-ness? • More questions in the audience?! 

More Related