20 likes | 159 Views
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task . Kai-Wei Chang, Rajhans Samdani , Alla Rozovskaya, Mark Sammons , and Dan Roth. Experiments and Results. Coreference. Improving on Mention Detection. Learning Protocol for Best-Link .
E N D
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Experiments and Results Coreference Improving on Mention Detection Learning Protocol for Best-Link Coreference Resolution is the task of grouping all the mentions of entities into equivalence classes so that each class represents a discourse entity. In the example below, the mentions are colour-coded to indicate which mentions are co-referent (overlapping mentions have been omitted for clarity). An Americanofficial announced that American President Bill Clinton met hisRussiancounterpart, Vladimir Putin, today. The president said that Russiawas a great country. • Baseline system implements a high recall and low precision rule-based system. The error analysis shows that there are two main errors: • Incorrect Mention Boundary • There are two main reasons for boundary errors: • Parse Mistakes: May be due to a wrong attachment or adding extra words to a mention. (e.g., if the parser attaches the relative clause “President Bush, who traveled to China yesterday” to a different noun. The system may predict “President Bush” as a mention. • Solutions: (We can only fix a subset of such errors) • Rule-based: Fix mentions starting with a stop word and mentions ending with a punctuation mark. • Use training data to learn patterns of inappropriate mention boundaries. • Annotation Inconsistency: a punctuation mark or an apostrophe used to mark the possessive form are inconsistently added to the end of mentions. • Solutions: • In training phrase, perform a relaxed matching between predicted mentions and gold mentions. • We cannot fix this problem in test phrase. • Non-referential Noun Phrases • Candidate noun phrases that are unlikely to refer to any entity in the real world. (E.g., “the same time”). • They are not considered as mentions in Ontonotes. • Solutions:Use the statistics in training data. • Count the number of times that a candidate mention happens to be a gold mention in the training data. • Remove candidate mentions that frequently appear in the training data but never appear as gold mentions. • Can also take the predicted head word and the words before and after the mention into account. (e.g., help to remove the noun “fact” in the phrase “in fact”). • We investigate two types of learning protocol for Baselink inference. • Baseline: Applying Binary Classification: • Baseline system applies the strategy in (Bengtson and Roth, 2008) to learn the pairwise scoring function w on: • Positive examples: for each mention u, we construct a positive example (u, v), where v is the closest preceding mention in u’s equivalence class. • Negative examples: all mention pairs (u, v), where v is a preceding mention of u and u, v are in different classes. • Drawbacks: • Suffer from a severe label imbalance problem. • Does not relate well to the Bestlink inference. • E.g., consider tree mention belonging to the same cluster: • {m1: “President Bush”, m2: “he”, m3:“George Bush”}. • Positive pair chosen by baseline system: (m2, m3). • Better choice: (m1, m3). • New System: Latent Structured Learning: • We consider the latent structured learning algorithm: • Input: mention v and its preceding mentions {u | u < v }. • Output: y(v) : the set of antecedents that co-refer with v. • h(v):a latent structure that denotes the Bestlink decision • the loss function: • At each iteration, the Algorithm takes the following steps: • Pick a mention v and finds the Bestlink decision u* that is consistent with the gold cluster. • Pick the Bestlink decision u’ with the current modelby solving a loss-augmented inference. • Update the model by the difference between the features vectors Á(u’, v) - Á(u*,v). We present the performance of the system on both the OntoNotes-4.0 and OnotNote-5.0 datasets: • Results on CoNLL-12 DEV set with pred. mentions • Results on CoNLL-11 DEV set with pred. mentions • Official scores on CoNLL-12 TEST set: We train the system on both TRAIN and DEV sets • We described strategies for improving mention detection and proposed and online latent structured algorithm for coreference resolution. • Using separate classifiers for coreference resolution on pronominal and non-pronominal mentions improves the system. • The performance of our Chinese coreference system is not as good as that for English. It may because we use the same features for English and Chinese and the feature set is developed for the English Corpus. • Overall, the system improves 5% MUC F1, 0.8% BCUB F1 and 1.7% CEAF F1 over baseline system. CoNLL Shared Task 2012 The CoNLL Shared Task 2012 is an extension of the last year’s coreference task. We use the Illinois-Coref system from CoNLL-2011 as the basis for our current system. We improve over the baseline system by: • Improving mention detection. • Using separate models for coreference resolution on pronominal and non-pronominal mentions. • Designing a latent structured learning protocol for Bestlink inference System Architecture Discussions and Conclusions Pronoun Resolution Best-Link Inference Procedure • Baseline uses an identical coreference model on both pronominal and non-pronominal mentions • However, features for coreference resolution on pronouns and non-pronouns are usually different. • Lexical features are more important for non- pronominal coreference resolution. • Gender features are more important for pronoun anaphora resolution. • New System: Use separate models • Training two separate classifiers with different sets of features for pronoun and non-pronoun coreference. • The pronoun anaphora classifier includes the features to identify the speaker and to reflect the document type. The inference procedure takes as input a set of pairwise mention scores over a document and aggregates them into globally consistent cliques representing entities. Best-Link Inference: For each mention, Best-Link considers the best mention on its left to connect to (best according the pairwise score) and creates a link between them if the pairwise score is above some threshold. Supported by ARL, and by DARPA, under the Machine Reading Program