Océ at CLEF 2003

Océ at CLEF 2003 Roel Brand Marvin Brünner Samuel Driessen Jakob Klok Pascha Iljin

Outline • Océ mission • Participation in 2001, 2002 • Participation in 2003: three models • Results • Conclusions • Remark on evaluation measures

Mission: To enable people to share information by offering products and services for the reproduction, presentation, distribution and management of documents. Océ-Technologies B.V. • active in approximately 80 countries • 23,000 people worldwide Research: >2000 employees

Participation in 2001, 2002 2001: Dutch mono-lingual task 2002: All mono-lingual tasks Several cross-lingual Multi-lingual

Participation in 2003 • Mono-lingual tasks • 3 ranking models: • BM25 • probabilistic • statistical

title + description parsing stop word removal BM25, probabilistic Query Query Topic

title + description parsing stop word removal + compound splitting, morphological variations statistical Query Query Topic

Indexing parsing stop words are not removed

Ranking functions probabilistic BM25 k1 & b parameters: the best match for 2002 Dutch urn model coordination level ranking statistical a set of clues degree of significance

Results

Conclusions • the BM25 model outperforms the probabilistic one • mathematical correctness - not the best guideline • for a better retrieval model: ‘knowledge’ about data collection topics assessments

Remark on evaluation measures top T=1000 docs; top N are read; M participants at most N*M relevance judgements Dutch data from 2001 1224 relevant documents for 50 queries => about 25 per query 16774 relevance judgements for 50 queries => about 335 per query about 60-70% docs in the top 1000 = unknown ?! = irrelevant ?! A proposal: Read all T docs. (T=100? 200?)

Océ at CLEF 2003

Océ at CLEF 2003

Presentation Transcript

PowerPoint 2003 Vs 2007

join the dots :: globe nightclub :: 23 march 2003

Financial Stability 1/2003 Charts Chapter 1

Correction du Cas CAMP-SPORT

Chap. 3 Microsoft Word 2003

Active Directory

DEPARTMENT OF HEALTH & WELFARE

ASP 2003 Handprints in Appalachia

Cardington Fire Test January 16. 2003

Update of ISAs from 2003 to 2008

INFOhio Retreat 2003

2003 FFA-FCCLA Summer Camp Week 4 June 16-20, 2003

Océ at CLEF 2003

Océ at CLEF 2003

Presentation Transcript

PowerPoint 2003 Vs 2007

join the dots :: globe nightclub :: 23 march 2003

Financial Stability 1/2003 Charts Chapter 1

Correction du Cas CAMP-SPORT

Chap. 3 Microsoft Word 2003

Active Directory

DEPARTMENT OF HEALTH &amp; WELFARE

ASP 2003 Handprints in Appalachia

Cardington Fire Test January 16. 2003

Update of ISAs from 2003 to 2008

INFOhio Retreat 2003

2003 FFA-FCCLA Summer Camp Week 4 June 16-20, 2003

DEPARTMENT OF HEALTH & WELFARE