240 likes | 307 Views
ResPubliQA 2010: QA on European Legislation. Anselmo Peñas , UNED, Spain Pamela Forner , CELCT , Italy Richard Sutcliffe , U. Limerick, Ireland Alvaro Rodrigo , UNED , Spain http://celct.isti.cnr.it/ResPubliQA/. Outline.
E N D
ResPubliQA 2010:QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo, UNED, Spain http://celct.isti.cnr.it/ResPubliQA/
Outline • The Multiple LanguageQuestionAnsweringTrack at CLEF – a bit ofHistory • ResPubliQAthisyear • Whatisnew • Participation, Runs and Languages • Assessment and Metrics • Results • Conclusions ResPubliQA 2010, 22 September, Padua, Italy
Multiple LanguageQuestionAnswering at CLEF Started in 2003: eighthyear Era I: 2003-2006 Ungrouped mainly factoid questions asked against monolingual newspapers; Exactanswersreturned Era II: 2007-2008 Grouped questions asked against newspapers and Wikipedia; Exact answers returned Era III: 2009-2010 ResPubliQA - Ungroupedquestionsagainstmultilingualparallel-aligned EU legislative documents; Passagesreturned ResPubliQA 2010, 22 September, Padua, Italy
ResPubliQA 2010 – SecondYear Butalso some novelties… • Key points: • same set of questions in all languages • same document collections: parallel aligned documents • Sameobjectives: • to move towards a domain of potential users • to allow the direct comparison of performances across languages • to allow QA technologies to be evaluated against IR approaches • to promote use of Validation technologies ResPubliQA 2010, 22 September, Padua, Italy
What’s new New Task (AnswerSelection) New documentcollection (EuroParl) New questiontypes AutomaticEvaluation ResPubliQA 2010, 22 September, Padua, Italy
The Tasks NEW ResPubliQA 2010, 22 September, Padua, Italy • Paragraph Selection (PS) • to extract a relevant paragraph of text that satisfies completely the information need expressed by a natural language question • Answer Selection (AS) • to demarcate the shorter string of text corresponding to the exact answer supported by the entire paragraph
The Collections NEW • Subset of JRC-Acquis (10,700 docs per lang) • EU treaties, EU legislation, agreements and resolutions • Between 1950 and 2006 • Parallel-aligned at the doc level (not always at paragraph) • XML-TEI.2 encoding • Small subset of EUROPARL (~ 150 docs per lang) • Proceedings of the European Parliament • translations into Romanian from January 2009 • Debates (CRE) from 2009 and Texts Adopted (TA) from 2007 • Parallel-aligned at the doc level (not always at paragraph) • XML encoding ResPubliQA 2010, 22 September, Padua, Italy
EuroParlCollection The specific fragments of JRC-Acquis and Europarl used by ResPubliQA is available at http://celct.isti.cnr.it/ResPubliQA/Downloads • is compatible with Acquis domain • allows to widen the scope of the questions • Unfortunately • small number of texts • documents are not fully translated ResPubliQA 2010, 22 September, Padua, Italy
Questions • two new question categories: • OPINION What did the Council think about the terrorist attacks on London? • OTHER What is the e-Content program about? • Reason and Purpose categories merged together Why was PerwizKambakhsh sentenced to death? • And also Factoid, Definition, Procedure ResPubliQA 2010, 22 September, Padua, Italy
ResPubliQA Campaigns More participants and more submissions ResPubliQA 2010, 22 September, Padua, Italy
ResPubliQA 2010 Participants 13 participants 8 countries 4 newparticipants ResPubliQA 2010, 22 September, Padua, Italy
Submissionsby Task and Language ResPubliQA 2010, 22 September, Padua, Italy
System Output • Twooptions: • Giveananswer (paragraph or exactanswer) • Return NOA asresponse = no answer is given The system is not confident about the correctness of its answer • Objective: • avoid to return an incorrect answer • reduce only the portion of wrong answers ResPubliQA 2010, 22 September, Padua, Italy
EvaluationMeasure nR: number of questions correctly answered nU: number of questions unanswered n: total number of questions (200 this year) If nU = 0 then c@1=nR/n Accuracy ResPubliQA 2010, 22 September, Padua, Italy
Assessment 31% of the answers automatically marked as correct Twosteps: • Automaticevaluation • responses automatically compared against the Gold Standard manually produced • answers that exactly match with the GoldStandard, are given the correct value (R) • correctness of a response: exact match of Document identifier, Paragraph identifier, and the text retrieved by the system with respect to those in the GoldStandard • Manual assessment • Non-matching paragraphs/ answers judged by human assessors • anonymous and simultaneous for the same question ResPubliQA 2010, 22 September, Padua, Italy
Assessment for Paragraph Selection (PS) • binary assessment: • Right (R) • Wrong (W) • NOA answers: • automatically filtered and marked as U (Unanswered) • discarded candidate answers were also evaluated • NoA R: NoA, but the candidate answer was correct • NoA W: NoA, and the candidate answer was incorrect • Noa Empty: NoA and no candidate answer was given • evaluators were guided by the initial “gold” paragraph • only a hint ResPubliQA 2010, 22 September, Padua, Italy
Assessment for Answer Selection (AS) R (Right): the answer-string consists of an exact and correct answer, supported by the returned paragraph; X (ineXact): the answer-string contains either part of a correct answer present in the returned paragraph or it contains all the correct answer plus unnecessary additional text; M (Missed): the answer-string does not contain a correct answer even in part but the returned paragraph in fact does contain a correct answer; W (Wrong): the answer-string does not contain a correct answer and moreover the returned paragraph does not contain it either; or it contains an unsupported answer ResPubliQA 2010, 22 September, Padua, Italy
MonolingualResultsforPS ResPubliQA 2010, 22 September, Padua, Italy
Improvement in the Performance Monolingual PS Task: ResPubliQA 2010, 22 September, Padua, Italy
Cross-languageResultsfor PS • In comparisontoResPubliQA 2009: • More cross-languageruns (+ 2) • Improvement in the best performance: from c@1 0.18 to 0.36 ResPubliQA 2010, 22 September, Padua, Italy
Resultsfor the AS Task ResPubliQA 2010, 22 September, Padua, Italy
Conclusions • SuccessfulcontinuationofResPubliQA 2009 • AS task: fewgroups and poorresults • Overallimprovementofresults • New documentcollection and newquestiontypes • c@1 evaluationmetricencourages the useofvalidationmodule ResPubliQA 2010, 22 September, Padua, Italy
More on System Analyses and Approaches MLQA’10 Workshop on Wednesday14:30 – 18:00 ResPubliQA 2010, 22 September, Padua, Italy