1 / 12

Océ at CLEF 2003

Océ at CLEF 2003. Roel Brand Marvin Brünner Samuel Driessen Jakob Klok. Pascha Iljin. Outline. Océ mission Participation in 2001, 2002 Participation in 2003: three models Results Conclusions Remark on evaluation measures. Mission:.

tokala
Download Presentation

Océ at CLEF 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Océ at CLEF 2003 Roel Brand Marvin Brünner Samuel Driessen Jakob Klok Pascha Iljin

  2. Outline • Océ mission • Participation in 2001, 2002 • Participation in 2003: three models • Results • Conclusions • Remark on evaluation measures

  3. Mission: To enable people to share information by offering products and services for the reproduction, presentation, distribution and management of documents. Océ-Technologies B.V. • active in approximately 80 countries • 23,000 people worldwide Research: >2000 employees

  4. Participation in 2001, 2002 2001: Dutch mono-lingual task 2002: All mono-lingual tasks Several cross-lingual Multi-lingual

  5. Participation in 2003 • Mono-lingual tasks • 3 ranking models: • BM25 • probabilistic • statistical

  6. title + description parsing stop word removal BM25, probabilistic Query Query Topic

  7. title + description parsing stop word removal + compound splitting, morphological variations statistical Query Query Topic

  8. Indexing parsing stop words are not removed

  9. Ranking functions probabilistic BM25 k1 & b parameters: the best match for 2002 Dutch urn model coordination level ranking statistical a set of clues degree of significance

  10. Results

  11. Conclusions • the BM25 model outperforms the probabilistic one • mathematical correctness - not the best guideline • for a better retrieval model: ‘knowledge’ about data collection topics assessments

  12. Remark on evaluation measures top T=1000 docs; top N are read; M participants at most N*M relevance judgements Dutch data from 2001 1224 relevant documents for 50 queries => about 25 per query 16774 relevance judgements for 50 queries => about 335 per query about 60-70% docs in the top 1000 = unknown ?! = irrelevant ?! A proposal: Read all T docs. (T=100? 200?)

More Related