480 likes | 599 Views
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages. AQA: a multilingual Anaphora annotation scheme for Question Answering. E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra. [eboldrini/patricio/borja/marcel/]@dlsi.ua.es
E N D
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages AQA: a multilingual Anaphora annotation scheme for Question Answering • E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra [eboldrini/patricio/borja/marcel/]@dlsi.ua.es chelo.vargas@ua.es
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Outline • Introduction • Corpus • Principles • Previous work • Problematic cases • Evaluation • Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductioninteraction • AQA: multilingual annotation scheme for anaphora resolution that can be applied in machine learning for the improvement of QA systems • To understand and annotate the way anaphora is used in each language • To be able to detect the antecedent of each the anaphora and find the correct answer • INTERACTION between the user and the system Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductionlanguages • Languages: Italian, Spanish, English • Advantages: participate successfully in competitions in which the question is formulated in a language and the system shows the answer in another language • Disadvantages: languages with different characteristics Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Introductionlanguages • Languages: Italian, Spanish, English • Advantages: can participate successfully in competitions in which the question is formulated in a language and the system shows you the answer in another language • Disadvantages: languages with different characteristics <t> <q id="q065"> ¿Qué medio de transporte se utilizó en la Expedición Kon-tiki? </q> <q id="q066"> ¿Cuántas personas <link rel="dir" status="ok" type="pron" ref="" ant="a" refq="q065">la</link> tripulaban? </q> </t> <t> <q id="q265"> Quale mezzo di trasporto venne usato nella spedizione Kon-Tiki? </q> <q id="q266"> Quanti membri d'equipaggio aveva <link rel="dir" status="ok" type="elips" ref="" ant="a" refq="q265">0</link>? </q> </t> <t> <q id="q465"> What transport was used in the Kon-Tiki Expedition? </q> <q id="q466"> How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Corpus • Corpus for CLEF 2008 in English, Italian and Spanish • 200 questions per language • Topic-related questions • Categories of questions: factoid, definition, and list Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • Each group has a topic Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • Each group has a topic <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elements • If there is a subtopic, we mark it <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each question (question/answer pair) has a number Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each question (question/answer pair) has a number <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • Each anaphora has a number, the same of its antecedent <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate if the antecedent is in the question or in the answer <t> <q id="q482"> Which city is the headquarters of the China's Eastern Fleet? </q> <q id="q483"> How far from China's capital city is <link rel="dir" status="ok" ant="a" refq="q482" type="pron" ref="">it</link>? </q> <q id="q484"> What was <link rel="indir" status="ok" ant="a" refq="q482" type="dd" ref="">its population</link> in 2002? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate the number of the question or the answer where the antecedent is situated Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We indicate the number of the question or the answer where the antecedent is situated <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q453"> In which country is <de id="n28">the Colditz Castle</de>? </q> <q id="q454"> Exactly in which state is <link rel="dir" status="ok" type="pron" ref="n28" ant="q" refq="q453">it</link>? </q> <q id="q455"> Who was the first who escaped from <link rel="dir" status="ok" type="adv" ref="n28" ant="q" refq="q453">there</link> ? </q> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of anaphora <t> <q id="q412"> Who published the Evangelium Vitae <de id="n6">encyclical</de>? </q> <q id="q413"> How many <link rel="dir" status="ok" ant="q" refq="q412" type="elips" ref="n6">0</link> did <link rel="dir" status="ok" ant="a" refq="q412" type="pron" ref="">he</link> publish? </q> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We select the type of relation <t> <q id="q416"> Which islands are in <de id="n9">the Pelagie Islands</de>? </q> <q id="q417"> Which is <link rel="indir" status="ok" type="dd" ref="n9" ant="q" refq="q416">the biggest one</link>? Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We underline if the annotator has doubts or not Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Principlesannotated elments • We underline if the annotator has doubts or not <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Previuos work • UCREL (Fligelstone, 1992; Garside et al., 1997): first scheme for anaphora resolution • MUC: inclusion of the coreference task in MUC-6 and MUC-7 • Last decade of 20th century: anaphora resolution project for French (Popescu, Belis and Robba, 1997). • Martínez-Barco and Palomar (2001): An annotation scheme for dialogues applied to anaphora resolution algorithm. • MATE/GNOME (Poesio, 2004): meta-model Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Previuos workwhat we added • MATE/GNOME (Poesio, 2004): meta-model • Element link in the text with the information about the anaphora • Identification of the question/answer pair • Topic/subtopic • Antecedent in the question or in the answer • Status of the annotation • Applied to three languages • Applied to collections of questions Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge <t> <q id="q404"> Which was <de id="n2">the "gordo" in the 1995 Christmas</de>? </q> <q id="q405"> Which was <link rel="indir" status="no" type="dd" ref="n2" ant="q" refq="q404">the prize</link>? </q> </t> • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one <t> <q id="q427"> Who were <de id="n14">the founders of <de id="n15">Magnum Photos</de></de>? </q> <q id="q428"> In what year did <link rel="dir" status="ok" ant="q" refq="q427" type="pron" ref="n14">they</link> found <link rel="dir" status="ok" type="pron" ref="n15" ant="q" refq="q427">it</link>? </q> </t> • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one <t> <q id="q432"> What is <de id="n18">the starring cast</de> of the film Beetlejuice? </q> <q id="q433"> Who of <link rel="dir" status="ok" type="pron" ref="n18" ant="q" refq="q432">them</link> is the main character? </q> </t> • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases <t> <q id="q429"> Between what days was <de id="n16">the battle of Brunete</de>? </q> <q id="q430"> Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published? </q> <subt> <q id="q431"> Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident? </q> </subt> </t> • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases ? • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated ? <t> <q id="q465"> What transport was used in the Kon-Tiki Expedition? </q> <q id="q466"> How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>? </q> </t> • Doubtful position of the antecedent • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Problematic cases • World knowledge • An antecedent contains another one • Collective nouns • Two antecedents, but separated • Doubtful position of the antecedent <t> <q id="q434"> What is <de id="n19">a censer</de> ? </q> <q id="q435"> What name is given to <de id="n20"><link rel="dir" status="no" type="pron" ref="n19" ant="q" refq="q434">the one</link> of the Cathedral of Santiago de Compostela </de>? </q> <q id="q436"> How much does <link rel="dir" status="ok" type="pron" ref="n20" ant="q" refq="q434">it</link> weight? </q> </t> • An anaphora inside a discourse entity Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluation • Annotation • 2 annotators • Blind annotation • Evaluation • Each language independently • Global results Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationsubdivision • Topic boundary • Anaphora detection • Anaphora attibutes • Antecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationtopic boundary • Class N: new topic • Class S: same topic Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora detection Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (antecedent) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (type) Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationanaphora attributes (relation) • Dir: direct relation • Indir: bridging relation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationantecedent recognition Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Evaluationglobal results • Total agreement results • Spanish: 60/70 = 0,857 • Italian: 60/69 = 0,869 • English: 59/67 = 0,880 • Average: 0,868 Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Conclusion • Multilingual annotation scheme for anaphora resoultion • For the improvement of QA system: the system can detect the antecedent of each anaphora and extract the correct answer • For a true interaction between the system and the user • Simple but complete • Positive results of the evaluation Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Future work • Integration of other languages • Application of the annotation scheme to other corpora Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages Evaluationmeasure used • Kappa Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion