120 likes | 249 Views
Océ at CLEF 2003. Roel Brand Marvin Brünner Samuel Driessen Jakob Klok. Pascha Iljin. Outline. Océ mission Participation in 2001, 2002 Participation in 2003: three models Results Conclusions Remark on evaluation measures. Mission:.
E N D
Océ at CLEF 2003 Roel Brand Marvin Brünner Samuel Driessen Jakob Klok Pascha Iljin
Outline • Océ mission • Participation in 2001, 2002 • Participation in 2003: three models • Results • Conclusions • Remark on evaluation measures
Mission: To enable people to share information by offering products and services for the reproduction, presentation, distribution and management of documents. Océ-Technologies B.V. • active in approximately 80 countries • 23,000 people worldwide Research: >2000 employees
Participation in 2001, 2002 2001: Dutch mono-lingual task 2002: All mono-lingual tasks Several cross-lingual Multi-lingual
Participation in 2003 • Mono-lingual tasks • 3 ranking models: • BM25 • probabilistic • statistical
title + description parsing stop word removal BM25, probabilistic Query Query Topic
title + description parsing stop word removal + compound splitting, morphological variations statistical Query Query Topic
Indexing parsing stop words are not removed
Ranking functions probabilistic BM25 k1 & b parameters: the best match for 2002 Dutch urn model coordination level ranking statistical a set of clues degree of significance
Conclusions • the BM25 model outperforms the probabilistic one • mathematical correctness - not the best guideline • for a better retrieval model: ‘knowledge’ about data collection topics assessments
Remark on evaluation measures top T=1000 docs; top N are read; M participants at most N*M relevance judgements Dutch data from 2001 1224 relevant documents for 50 queries => about 25 per query 16774 relevance judgements for 50 queries => about 335 per query about 60-70% docs in the top 1000 = unknown ?! = irrelevant ?! A proposal: Read all T docs. (T=100? 200?)