200 likes | 312 Views
CLEF 2008 Multilingual Question Answering Track. UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner. QA 2008 Task and Exercises. QA Main task (6th edition) Pilot: QA WSD, English newswire collections with Word Sense Disambiguation
E N D
CLEF 2008Multilingual Question Answering Track UNED Anselmo Peñas Valentín Sama Álvaro Rodrigo CELCT Danilo Giampiccolo Pamela Forner
QA 2008 Task and Exercises • QA Main task (6th edition) • Pilot: QA WSD, English newswire collections with Word Sense Disambiguation • Answer Validation Exercise – AVE (3rd edition) • QA on Speech Transcripts – QAST (2nd edition)
Main Task QA 2008Organizing Committee • CELCT (D. Giampiccolo, P. Forner): Italian • UNED (A. Peñas): Spanish • U. Groeningen (G. Bosma): Dutch • U. Limerick (R. Sutcliff): English • DFKI (B. Sacalenau): German • ELDA/ELRA (N. Moreau): French • Linguateca (P. Rocha): Portuguese • Bulgarian Academy of Sciences (P. Osenova): Bulgarian • IASI (C. Forascu): Romanian • U. Basque Country (I. Alegria): Basque • ILSP (P.Prokopidis): Greek
200 questions • FACTOID • (loc, mea, org, per, tim, cnt, obj , oth) • DEFINITION • (per, org, obj, oth) • CLOSED LIST • Who were the components of The Beatles? • Who were the last three presidents of Italy? • LINKED QUESTIONS • Who was called the “Iron-Chancellor”? • When was he born? • Who was his first wife? • Temporal restrictions by date, by period, by event • NIL questions (without known answer in the collection)
43 Activated Language Combinations(at least one registered participant)
List of Participants (random order) Bulgaria
Groups per year and target collection Natural selection? Task Change Above 20 groups
2008 participation: Comparative evaluation? Lack from evaluation perspective: 4 languages without comparison between different groups Breakout session
Results depend on type of questions • Definitions • Almost solved for several systems 80%-95% • Factoids • 50%-65% for several systems • Temporal restrictions • Same level of difficulty as factoids for some systems • Closed lists • Still very difficult • Linked questions • Still very difficult • Now wikipedia provides more answers
Conclusion • Same task as 2007 • Same level of participation (slightly better) • 11 target languages (9 with participation) • 43 activated subtasks • 21 participants • 51 runs • Same results (slightly better)
Future direction • Less participants per language • Poor comparison • Change methodology: one task for all • Critics to QA over wikipedia • Easier to find questions with IR • No user model • Change collection • QA proposal for 2009 • SC and breakout