Incorporating Contextual Cues in Trainable Models for Coreference Resolution

Incorporating Contextual Cues in Trainable Models for Coreference Resolution 14 April 2003 Ryu Iida Computational Linguistic Laboratory Graduate School of Information Science Nara Institute of Science and Technology

Background Two approaches to coreference resolution • Rule-based approach [Mitkov 97, Baldwin 95, Nakaiwa 96, Okumura 95, Murata 97] • Many attempted to encode linguistic cues into rules • This was significantly influenced by Centering Theory[Grosz 95, Walker et al. 94, Kameyama, 86] • Best-achieved performance in MUC: Precision roughly 70%(Message Understanding Conference) Recall roughly 60% • Corpus-based machine learning approach[Aone and Bennett 95, Soon et al. 01, Ng and Cardie 02, Seki 02] • Cost effective • They have achieved a performance comparable to best performing rule-based systems Problem: Further manual refinement is needed in this study but it will be prohibitively costly Problem: These previous work tend to lack an appropriate reference to the theoretical linguistic work on coherence and coreference

Background • Challenging issue • Achieving a good union between theoretical linguistic findings and corpus-based empirical methods

Outline of this Talk • Background • Problems with previous statistical approaches • Two methods • Centering features • Tournament-based search model • Experiments • Conclusions

Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02] • Reach a level of performance comparable to state-of-the-art rule-based systems • Recast the task of anaphora resolution as a sequence of classification problems

positive USAir USAir Group Inc outputclass negative USAir order negative suit USAir Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02] antecedent • the task is to classify these pairs of noun phrases into positive or negative • positive instance: Pair of an anaphor and the antecedent • negative instance: Pairs of an anaphor and the NPs located between the anaphor and the antecedent [MUC-6] A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. 〇 × × anaphor

先行詞候補 先行詞候補照応詞照応詞 Person:1 Person:1 ハ:1 ハ:1 Person:1 Person:1 ハ:1 ハ:1 SENT_DIST:0 SENT_DIST:0 negative negative Pronoun:0 Pronoun:0 Pronoun:1 Pronoun:1 positive USAir USAir Group Inc negative USAir order negative suit USAir Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02] • Feature set [Ng and Cardie 02] POSDEMONSTRATIVE STRING_MATCHNUMBERGENDERSEM_CLASSDISTANCESYNTACTIC ROLE features candidate anaphor Organization:1 Prp_noun:1 Organization:1 SENT_DIST:0 positive STR_MATCH:0 Pronoun:0 Pronoun:0 training (C4.5) Model (decision tree)

NP1 NP2 NP3 NP4 NP5 NP6 1.5 1.5 NP7 antecedent NP8 NP6 anaphor Statistical approaches [Soon et al. ‘01, Ng and Cardie ‘02] Test Phase [Ng and Cardie, 02] • Precision78.0%, Recall64.2% • Slightly better than best-performing rule-based model at MUC-7 extract NPs candidates Select the best-scored candidateas the output Input each pair of given anaphorand one of these candidates to the decision tree -2.0 -1.1 -0.4 We refer to Ng and Cardie’s model as the baseline of our empirical evaluation -1.0 -3.5 -0.3 -2.5

Sarah went downstairs and received another curious shock, for when Glendora flapped into the dining room in her home made moccasins, Sarah asked her when she had brought coffee to her room, and Glendora said she hadn't. Sarah she negative Glendora she positive A drawback of the previous statistical models The previous models do not capture local context appropriately antecedent anaphor [Kameyama 98] features POS : Noun Prop_Noun : YesPronoun : NoNE : PERSONSEM_CLASS : Person SENT_DIST : 0 Positive and negative instances may have the identical feature vector POS : Noun Prop_Noun : YesPronoun : NoNE : PERSONSEM_CLASS : Person SENT_DIST : 0

Two methods

Two methods • Use more sophisticated linguistic cues: centering features • Augmentation of a set of new features inspired by Centering Theory that implement local contextual factors • Improve the search algorithm: tournament model • A new model which makes pair-wise comparisons between candidates

Sarah went downstairs and received another curious shock, …… she hadn't. Sarah she negative CHAIN(Cb = Cp = Sarah) Glendora she positive transition antecedent CHAIN(Cb = Cp = Glendora) Glendora Centering Features features POS : Noun Prop_Noun : YesPronoun : NoNE : PERSONSEM_CLASS : Person SENT_DIST : 0 POS : Noun Prop_Noun : YesPronoun : NoNE : PERSONSEM_CLASS : Person SENT_DIST : 0 the problem is that the current feature set does not tell the difference between these two candidates Introduce extra devices such as the forward-looking center list Encode state transitions on them into a set of additional features

Two methods • Use more sophisticated linguistic cues: centering features • We augment the feature set with a set of new features inspired by Centering theory that implement local contextual factors • Improve the search algorithm: tournament model • We propose a new model which makes pair-wise comparisons between antecedent candidates

downstairs × × dining room × Sarah 〇〇〇 Tournament model • What we want to do is to answer a question which is more likely to be coreferent, Sarah or Glendora • Conduct a tournament consisting of a series of matches in which candidates compete with each other • Match victory is determined by a pairwise comparison between candidates as a binary classification problem • Most likely candidate is selected through a single-elimination tournament of matches Sarah went downstairs and received another curious shock, for when Glendora flapped into the dining room in her home made moccasins,Sarahasked her when she had brought coffee to her room, andGlendora saidshehadn't.

Tournament model Training instances features class • Training Phase right NP1 NP5 ANP • In the tournament, the correct antecedent NP5 must prevail over any of the other four candidates • Extract four training instances • Induce a pairwise classifier from a set of extracted training instances • The classifier classifies a given pair of candidates into left or right right NP4 NP5 ANP ANP left NP5 NP7 ANP left NP5 NP8 the right hand side of a given pair wins (is more likely to be the antecedent) antecedent NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8 ANP coreferent coreferent coreferent anaphor beginning of document

Tournament model • the first match is arranged between the nearest candidates (NP7 and NP8) • each of the following matches arranged in turn between the winner (NP8) of the previous match and a new challenger (NP5) • Test Phase NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8 ANP coreferent coreferent coreferent anaphor beginning of document

antecedent NP5 Tournament model 3. the winner is next matched against the next challenger (NP4) 4. this process is repeated until the last one participate 5. the model selects the candidate that prevails through the final round as the answer • Test Phase NP1 NP2 NP3 NP4 NP5 NP6 NP7 NP8 ANP coreferent coreferent coreferent anaphor beginning of document

Experiments

Experiments • Empirical evaluation on Japanese zero-anaphora resolution • Japanese does not normally use personal pronoun as anaphor • Instead, Japanese uses zero-pronouns • Comparison among four models • Baseline model • Baseline model with Centering Features • Tournament model • Tournament model with Centering Features

Centering Features in Japanese • Japanese anaphora resolution model [Nariyama 02] • Expansion of Kameyama’s work on the application of Centering Theory to Japanese zero-anaphora resolution • Expanding the original forward-looking center listinto Salience Reference List (SRL) to take into account broader contextual information • More use of linguistic information • In the experiments, we introduced two features to reflect the SRL-related contextual factors

Method • Data • GDA-tagged Japanesenewspaper article corpus • Texts ： 2,176 60 • Sentences ： 24,475 - • Tags of anaphoric relation　　　： 14,743 8,946 • Tags of ellipsis (Zero-anaphor) ： 5,966 0 • As a preliminarily test, only resolving subject zero-anaphors, 2,155 instances in total • Conduct five fold cross-validation on that data set with support vector machines GDA MUC-6

POS • Pronoun • Particle • Named-Entity • Semantic class • Animacy • Selectional Restrictions • Distance between an anaphor and the candidate • Number of anaphoric relations Feature set (see our paper for details) • Features for simulating Ng and Cardie’s feature set • Centering Features • Features for capturing the relations between two candidates • Order in SRL • Heuristic rule of preference introduce only in tournament model but not in the baseline model • Preference of SRL in two candidates • Preference of Animacy in two candidates • Distance between two candidates

Results Tournament model Baseline model +Centering Features Baseline model Tournament model + Centering Features

Results (1/3) the effect of incorporating centering features Baseline model +Centering Features 67.0% 64.0% Baseline model centering features were reasonably effective

Results (2/3) Tournament model Baseline model +Centering Features 70.8% 67.0% 64.0% Baseline model Introducing the tournament model significantly improved the performance regardless the size of training data

Results (3/3) Tournament model Baseline model +Centering Features 70.8% 69.7% 67.0% 64.0% Baseline model Tournament model + Centering Features most complex model did not outperform the tournament model without centering features The improvement ratio of this model against the data size is the best of all

Results after cleaning data (March ‘03) 74.3% Tournament model + Centering Features 72.5% Tournament model the tournament model with centering featuresis more effective than the one without centering features

Conclusions • Our concern is achieving a good union between theoretical linguistic findings and corpus-based empirical methods • We presented a trainable coreference resolution model that is designed to incorporate contextual cues by means of centering features and a tournament-based search algorithm. These two improvements worked effectively in our experiments on Japanese zero-anaphora resolution.

Future Work • In Japanese zero-anaphora resolution, • Identification of relations between the topic and subtopics • Analysis of complex and quoted sentences • Refinement of the treatment of selectional restrictions

Tournament model Training instances features class • Training Phase right beginning of document NP1 NP5 ANP NP1 right NP4 NP5 ANP coreferent NP2 ANP left NP5 NP7 coreferent antecedent NP3 ANP left NP5 NP8 NP4 • In the tournament, the correct antecedent NP5 must prevail over any of the other four candidates • extract four training instances • Induce from a set of extracted training instances a pairwise classifier NP5 NP6 NP7 coreferent NP8 anaphor ANP

< < antecedent > NP5 < Tournament model • Test Phase beginning of document A tournament consists of a series of matchesin which candidates compete with each other NP1 coreferent NP2 coreferent NP3 NP4 NP5 NP6 NP7 coreferent NP8 anaphor ANP

< < < Tournament model • What we want to do is to answer a question which is more likely to be coreferent, Sarah or Glendora • Implement a pairwise comparison between candidates as a binary classification problem Sarah went downstairs and received another curious shock, for when Glendora flapped into the dining room in her home made moccasins,Sarahasked her when she had brought coffee to her room, andGlendora saidshehadn't. downstairs dining room Sarah CHAIN(Cb = Cp = Sarah): CHAIN(Cb = Cp = Glendora): transition < Glendora Sarah she

Sarah went downstairs and received another curious shock, for when Glendora flapped into the dining room in her home made moccasins, Sarah asked her when she had brought coffee to her room, and Glendora said she hadn't. Tournament model • Training Phase She extract NPs downstairs Training instances Glendora < Glendora downstairs she moccasins Sarah < Glendora moccasins she her she < Glendora coffee she coffee her < Glendora Sarah she room < Glendora room she Glendora she output class coreferent coreferred

Conclusions • To incorporate linguistic cues into trainable approaches: • Add features which takes into consideration linguistic cues such as Centering Theory: Centering Features • Propose the novel search model which the candidates are compared in terms of the likelihood of antecedents:Tournament model • In Japanese zero-anaphora resolution task,Tournament model significantly outperforms earliermachine learning approaches [Ng and Cardie 02] Incorporating linguistic cues in machine learning models is effective

coreferent coreferent Ellipsis (AGENT) Data • GDA-tagged Japanesenewspaper article corpus • Texts ： 2,176 60 • Sentences ： 24,475 - • Tags of anaphoric relation　　　： 14,743 8,946 • Tags of ellipsis (Zero-anaphor) ： 5,966 0 GDA MUC-6 <n id=“tagid1”>クリントン米大統領</n>の内政の最大課題のひとつである<n id=“tagid2”>包括犯罪対策法案</n>が十一日の下院本会議で、審議・表決に移ることを承認する動議が、反対二二五対賛成二一〇で否決された。これで<n eq=“tagid2”>同法案</n>は事実上、大幅修正または廃案に追い込まれた。<n eq=“tagid1”>同大統領</n>は緊急会見で怒りをあらわにして、法案の復活を要求。<n eq=“tagid1”>同大統領</n>は中間選挙を前に得点を<v agt=“tagid1”>あげる<v>ことを目指したが、逆に大きな痛手を受けた。 Extract 2,155 example

[MUC-6] A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. positive USAir USAir Group Inc negative USAir order negative suit USAir Statistical approaches [Soon et al. 01, Ng and Cardie 02] • Reach a level of performance comparable to state-of-the-art rule-based systems • Recast the task of anaphora resolution as a sequence of classification problems • Pair of an anaphor and the antecedent:positive instance • Pairs of an anaphor and the NPs located between the anaphor and the antecedent: negative instance the task is to classify these pairs of noun phrases into positive or negative. outputclass

Sarah went downstairs and received another curious shock, for when Glendora flapped into the dining room in her home made moccasins, Sarah asked her when she had brought coffee to her room, and Glendora said she hadn't. *Centering Features • Centering Theory [Grosz 95, Walker et al. 94, Kameyama, 86] • Part of an overall theory of discourse structure and meaning • Two levels of discourse coherence: global and local • Centering models the local-level component of attentional state • e.g. Intrasentential centering [Kameyama 97]

*Centering Features in English [Kameyama 97] Sarah went downstairs and received another curious shock, for when Glendora flapped into the dining room in her home made moccasins, Sarah asked her when she had brought coffee to her room, and Glendora said she hadn't. CHAIN(Cb = Cp = Sarah): ESTABLISH(Cb = Cp = Glendora): CHAIN(Cb = Glendora, Cp = Sarah): CHAIN(Cb = Cp = Glendora): CHAIN(Cb = NULL, Cp = Glendora): CHAIN(Cb = Cp = Glendora): [Kameyama 97]

*Centering Features in English [Kameyama 97] • The essence is that takes into account the preference between candidates • Cb and Cp distinguish the two candidates Sarah went downstairs and received another curious shock, …… she hadn't. CHAIN(Cb = Cp = Sarah) transition CHAIN(Cb = Cp = Glendora) Implement local contextual factor:centering features

She downstairs downstairs < shock shock < Glendora room room her < moccasins moccasins Sarah < antecedent her she Glendora coffee coffee < her her < room room < Glendora she *Tournament model • Test Phase A tournament consists of a series of matchesin which candidates compete with each other

Rule-based Approaches • Encoding linguistic cues into rules manually • Thematic roles of the candidates • Order of the candidates • Semantic relation between anaphors and antecedents • etc.. • This approaches are influenced by Centering Theory[Grosz 95, Walker et al. 94, Kameyama, 86] • The Coreference Resolution Task of Message Understanding Conference (MUC-6 / MUC-7) • Precision: roughly 70% • Recall: roughly 60% Further manual refinement of rule-based modelswill be prohibitively costly

Making a good marriage between theoretical linguistic findings and corpus-based empirical methods Statistical Approaches with Tagged-Corpus • The statistical approaches have achieved a performance comparable to the best-performing rule-based systems • Lack an appropriate reference to theoretical linguisticwork on coherence and coreference

〇 × antecedent USAir Group Inc × *Test Phase [Soon et al. 01] extracting NP A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. • Precision67.3%, Recall58.6% on MUC data set candidates judge Pittsburgh order Trans World Airlines share USAir Group Inc order a suit USAir anaphor

Improving Soon’s model • [Ng and Cardie 02] • Expanding the feature set • 12 features⇒　53 features • Introducing a new search algorithm POSDEMONSTRATIVE STRING_MATCHNUMBERGENDERSEM_CLASSDISTANCE SYNTACTIC ROLE

〇 × antecedent USAir Group Inc × Test Phase [Soon et al. 01] extracting NP A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlines from buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. • Precision67.3%, Recall58.6% on MUC data set candidates judge Pittsburgh order Trans World Airlines share USAir Group Inc order a suit USAir anaphor

Task of Coreference Resolutions • Two process • Resolution of anaphors • Resolution of antecedents • applications • Machine Translation, IR, etc antecedent A federal judge in Pittsburgh issued a temporary restraining order preventing Trans World Airlinesfrom buying additional shares of USAir Group Inc. The order, requested in a suit filed by USAir, dealt another blow to TWA's bid to buy the company for $52 a share. (Same color NPs are coreferred) [MUC-6] anaphor

Future Work • Evaluate some examples • Tournament model doesn’t deal with Direct quote • Proposed methods cannot deal with different discourse structures …… 獄に下るモハンメドは妻にこう言い残した。「おれが刑務所にいる間、外で働いてはいけない」。貞節を守れ、という意味だ。さすがに刑務所で新しい子供に恵まれる可能性はないと思ったのだろうか。 SRL

Centering Features of Japanese • Adding the likelihood of antecedents into features • In Japanese, wa-marked NPs tend to be topics • Topics tend to be omitted • Salience Reference List (SRL) [Nariyama 02] • Store NPs in SRL from the beginning of text • Overwrite the old entity if new entity fills same point Topic/φ (wa) > Focus (ga) > I-Obj (ni) > D-Obj (wo) > Others preferred …NP1-waNP2-wo…。 …NP3-ga…、NP4-ha…。 …NP5-ni……(φ-ga)V。

President A armistice President A corefered President B armistice this < President B < this action < he > action he Evaluation of models • Introduce a confidence measure • Confidence coefficient is the value when two candidatesare the nearest at the tournament 0.9 2.4 3.2 3.8

Incorporating Contextual Cues in Trainable Models for Coreference Resolution

Incorporating Contextual Cues in Trainable Models for Coreference Resolution

Presentation Transcript

Supervised models for coreference resolution

Error Analysis for Learning-based Coreference Resolution

Easy-First Coreference Resolution

Decision Trees for Coreference Resolution

Extracting Adaptive Contextual Cues From Unlabeled Regions

Specialized models and ranking for coreference resolution

Coreference Resolution

A Global Relaxation Labeling Approach to Coreference Resolution

Memory-based learning for noun phrase coreference resolution

Inference Protocols for Coreference Resolution

Graph-based Event Coreference Resolution

Learning noun phrase coreference resolution

Unsupervised Models for Coreference Resolution

Learning Dutch noun phrase coreference resolution

Coreference Resolution using Web-Scale Statistics

Detecting Anaphoricity and Antecedenthood for Coreference Resolution

Learning noun phrase coreference resolution

A Constrained Latent Variable Model for Coreference Resolution

First-Order Probabilistic Models for Coreference Resolution

Using MapReduce for Scalable Coreference Resolution