1 / 48

AQA: a multilingual Anaphora annotation scheme for Question Answering

CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages. AQA: a multilingual Anaphora annotation scheme for Question Answering. E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra. [eboldrini/patricio/borja/marcel/]@dlsi.ua.es

oria
Download Presentation

AQA: a multilingual Anaphora annotation scheme for Question Answering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages AQA: a multilingual Anaphora annotation scheme for Question Answering • E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra [eboldrini/patricio/borja/marcel/]@dlsi.ua.es chelo.vargas@ua.es

  2. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Outline • Introduction • Corpus • Principles • Previous work • Problematic cases • Evaluation • Conclusion

  3. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductioninteraction • AQA: multilingual annotation scheme for anaphora resolution that can be applied in machine learning for the improvement of QA systems • To understand and annotate the way anaphora is used in each language • To be able to detect the antecedent of each the anaphora and find the correct answer • INTERACTION between the user and the system Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  4. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductionlanguages • Languages: Italian, Spanish, English • Advantages: participate successfully in competitions in which the question is formulated in a language and the system shows the answer in another language • Disadvantages: languages with different characteristics Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  5. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductionlanguages • Languages: Italian, Spanish, English • Advantages: can participate successfully in competitions in which the question is formulated in a language and the system shows you the answer in another language • Disadvantages: languages with different characteristics <t> <q id="q065"> ¿Qué medio de transporte se utilizó en la Expedición Kon-tiki? </q> <q id="q066"> ¿Cuántas personas <link rel="dir" status="ok" type="pron" ref="" ant="a" refq="q065">la</link> tripulaban? </q> </t> <t> <q id="q265"> Quale mezzo di trasporto venne usato nella spedizione Kon-Tiki? </q> <q id="q266"> Quanti membri d'equipaggio aveva <link rel="dir" status="ok" type="elips" ref="" ant="a" refq="q265">0</link>? </q> </t> <t> <q id="q465"> What transport was used in the Kon-Tiki Expedition? </q> <q id="q466"> How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  6. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Corpus • Corpus for CLEF 2008 in English, Italian and Spanish • 200 questions per language • Topic-related questions • Categories of questions: factoid, definition, and list Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  7. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • Each group has a topic Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  8. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • Each group has a topic <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  9. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • If there is a subtopic, we mark it <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  10. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each question (question/answer pair) has a number Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  11. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each question (question/answer pair) has a number <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  12. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each anaphora has a number, the same of its antecedent <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  13. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  14. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  15. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer <t> <q id="q482"> Which city is the headquarters of the China's Eastern Fleet? </q> <q id="q483"> How far from China's capital city is <link rel="dir" status="ok" ant="a" refq="q482" type="pron" ref="">it</link>? </q> <q id="q484"> What was <link rel="indir" status="ok" ant="a" refq="q482" type="dd" ref="">its population</link> in 2002? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  16. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate the number of the question or the answer where the antecedent is situated Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  17. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate the number of the question or the answer where the antecedent is situated <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  18. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  19. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  20. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q453"> In which country is <de id="n28">the Colditz Castle</de>? </q> <q id="q454"> Exactly in which state is <link rel="dir" status="ok" type="pron" ref="n28" ant="q" refq="q453">it</link>? </q> <q id="q455"> Who was the first who escaped from <link rel="dir" status="ok" type="adv" ref="n28" ant="q" refq="q453">there</link> ? </q> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  21. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q412"> Who published the Evangelium Vitae <de id="n6">encyclical</de>? </q> <q id="q413"> How many <link rel="dir" status="ok" ant="q" refq="q412" type="elips" ref="n6">0</link> did <link rel="dir" status="ok" ant="a" refq="q412" type="pron" ref="">he</link> publish? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  22. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  23. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  24. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation <t> <q id="q416"> Which islands are in <de id="n9">the Pelagie Islands</de>? </q> <q id="q417"> Which is <link rel="indir" status="ok" type="dd" ref="n9" ant="q" refq="q416">the biggest one</link>? Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  25. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We underline if the annotator has doubts or not Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  26. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We underline if the annotator has doubts or not <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  27. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Previuos work • UCREL (Fligelstone, 1992; Garside et al., 1997): first scheme for anaphora resolution • MUC: inclusion of the coreference task in MUC-6 and MUC-7 • Last decade of 20th century: anaphora resolution project for French (Popescu, Belis and Robba, 1997). • Martínez-Barco and Palomar (2001): An annotation scheme for dialogues applied to anaphora resolution algorithm. • MATE/GNOME (Poesio, 2004): meta-model Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  28. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Previuos workwhat we added • MATE/GNOME (Poesio, 2004): meta-model • Element link in the text with the information about the anaphora • Identification of the question/answer pair • Topic/subtopic • Antecedent in the question or in the answer • Status of the annotation • Applied to three languages • Applied to collections of questions Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  29. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  30. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge <t> <q id="q404"> Which was <de id="n2">the "gordo" in the 1995 Christmas</de>? </q> <q id="q405"> Which was <link rel="indir" status="no" type="dd" ref="n2" ant="q" refq="q404">the prize</link>? </q> </t> • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  31. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one <t> <q id="q427"> Who were <de id="n14">the founders of <de id="n15">Magnum Photos</de></de>? </q> <q id="q428"> In what year did <link rel="dir" status="ok" ant="q" refq="q427" type="pron" ref="n14">they</link> found <link rel="dir" status="ok" type="pron" ref="n15" ant="q" refq="q427">it</link>? </q> </t> • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  32. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one <t> <q id="q432"> What is <de id="n18">the starring cast</de> of the film Beetlejuice? </q> <q id="q433"> Who of <link rel="dir" status="ok" type="pron" ref="n18" ant="q" refq="q432">them</link> is the main character? </q> </t> • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  33. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  34. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases ? • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated ? <t> <q id="q465"> What transport was used in the Kon-Tiki Expedition? </q> <q id="q466"> How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>? </q> </t> • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  35. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent <t> <q id="q434"> What is <de id="n19">a censer</de> ? </q> <q id="q435"> What name is given to <de id="n20"><link rel="dir" status="no" type="pron" ref="n19" ant="q" refq="q434">the one</link> of the Cathedral of Santiago de Compostela </de>? </q> <q id="q436"> How much does <link rel="dir" status="ok" type="pron" ref="n20" ant="q" refq="q434">it</link> weight? </q> </t> • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  36. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluation • Annotation • 2 annotators • Blind annotation • Evaluation • Each language independently • Global results Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  37. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationsubdivision • Topic boundary • Anaphora detection • Anaphora attibutes • Antecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  38. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationtopic boundary • Class N: new topic • Class S: same topic Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  39. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora detection Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  40. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (antecedent) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  41. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (type) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  42. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (relation) • Dir: direct relation • Indir: bridging relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  43. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationantecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  44. Evaluationglobal results • Total agreement results • Spanish: 60/70 = 0,857 • Italian: 60/69 = 0,869 • English: 59/67 = 0,880 • Average: 0,868 Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  45. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Conclusion • Multilingual annotation scheme for anaphora resoultion • For the improvement of QA system: the system can detect the antecedent of each anaphora and extract the correct answer • For a true interaction between the system and the user • Simple but complete • Positive results of the evaluation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  46. Future work • Integration of other languages • Application of the annotation scheme to other corpora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

  47. CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationmeasure used • Kappa Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion

More Related