1 / 12

Incorporating Non-Relevance Information in the Estimation of Query Models

2008 TREC. Incorporating Non-Relevance Information in the Estimation of Query Models. Edgar Meij, Wouter Weerkamp, Jiyin He, Maarten de Rijke. Chen, Yi-wen Dept. of Computer Science & Information Engineering National Taiwan Normal University. ✩ 2012/5/15. Introduction.

sage
Download Presentation

Incorporating Non-Relevance Information in the Estimation of Query Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2008 TREC Incorporating Non-Relevance Informationin the Estimation of Query Models Edgar Meij, Wouter Weerkamp, Jiyin He, Maarten de Rijke Chen, Yi-wen Dept. of Computer Science & Information Engineering National Taiwan Normal University ✩2012/5/15

  2. Introduction • We introduce a new model which incorporates information from relevantand non-relevant documents to improve the estimation of query models.

  3. Retrieval Framework uniform

  4. Modeling Non-Relevance Kraaij (2004) defines the NLLR (Normalized Log Likelihood Ratio) measure as being equivalent to determining the negative KL-divergence for document retrieval. where is the cross-entropy between two multinomial language models: Eq. 5 can be interpreted as the relationship between two language models Q and D, normalized by a third language model C.

  5. Modeling Non-Relevance Where But use (5) ...

  6. Runs • no relevance judgements • 3 relevant documents • 3 relevant and 3 non-relevant documents (D) 10 judged documents (division unknown) (E) large set of judgements(division and number unknown) we compare our model using explicit non-relevance information to the same model using the collection as a non-relevance model, by submitting two distinct runs: met6, using the set of non-relevant documents met9, using only the collection (, viz. Eq. 7). • standard stopword removal • stemming using a Porter stemmer. • Due to a small bug in the implementation, however, parameter was not properly normalized, causing a slight difference in the retrieval results for met6 on set B.

  7. Runs Baseline

  8. Runs When we zoom out and look at the full range of available topics (Table 2), we observe that both models improve statMAP over the baseline (set A) for the full set of topics. When the feedback set is small, met9 improves statMAP more effectively than met6, i.e., the background model is performing better than the non-relevant documents. The difference does seem to suggest that the amount of non-relevance information needs to reach a certain size to outperform the estimation on the collection.

  9. Runs Figure 1 shows a per-topic breakdown of the difference in MAP between the two submitted runs. We observe that most topics are helped more using the collection-based estimates. We have to conclude that, for the TREC Terabyte topics, the estimation on the collection yields the highest retrieval performance and is thus a better estimate of non-relevance than the judged non-relevant documents.

  10. Runs Given a limited amount of non-relevant documents (sets C and D), our model especially improves early precision, although not significantly. A larger amount of non-relevant documents (set E) decreases overall retrieval effectiveness. From Figure 2a we observe that set E only outperforms the other sets at the very ends of the graph.

  11. Conclusion • met6, a larger number of judged non-relevant documents improve retrieval effectiveness most • the collection is a viable and sufficient alternative

  12. Future Work • First, we intend to investigate the impact of the available judged (non-)relevant documents and their properties with respect to the estimates on the collection. • Second, given the relevance assessments, we will try to find better ways of estimating the true distribution of the (non-)relevant information within our framework.

More Related