180 likes | 368 Views
Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns. Xiaofeng Yang, Jian Su ACL 2007. Introduction. Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world.
E N D
Coreference Resolution Using Semantic Relatedness Information from Automatically Discovered Patterns Xiaofeng Yang, Jian Su ACL 2007
Introduction • Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world. • Semantic relatedness • Noun phrases used to refer to the same entity should have a certain semantic relation • How to we determine semantic relatedness? • WordNet (Ng and Cardie, 2002 etc.) • Search for patterns (Hearst 1998. etc) • Pattern Selection done in adhoc manner • Concerns: accuracy and breadth • Objectives of this paper: • Automatically acquire and evaluate patterns • Mine patterns for semantic relatedness information
Some Examples • Multiple cultivars of fruitssuch as apples are sometimes grafted on a single tree • EU institutions and other bodies.. • Disasters such as the earthquake and tsunami in Japan
Baseline Coreference System • i{NPi, NPj} where • NPi = antecedent candidate • NPj = anaphor • Training • For each NPj • Create a single positive training instance for its closest antecedent • Add negative training instances for every intervening NP between NPjand its antecedent • Testing • Process input document from first NP to last • For each encountered NPj, • Create a test instance for each antecedent candidate
NPi-2, NPi-1, NPi, NPi+1, NPi+2,NPi+3, NPj Training NPi, NPj(+) NPi+1, NPj (-) Ni+2, NPj (-) NPi+3, NPj (-) Testing • NPi-2, NPj • NPi-1, NPj NPi, NPj NPi+1, NPj Ni+2, NPj NPi+3, NPj
Incorporate Non-Anaphors in Training • Apply learned classifier to all the non-anaphors in the training documents. • For each non-anaphor that is classified as (+) • Pair the non-anaphor and its false antecedent to create a negative example • Add these negative examples to original training set • Classifier capable of • Antecedent identification • Non-anaphor identification
Acquiring Patterns • Derive patterns to indicate a specific semantic relation • Use NP pairs in the training instances as seeds • Except: • When NPi or NPj are pronouns • NPi and NPj have the same head word • i{NPi, NPj} = seed (Ei: Ej) • i{“Bill Clinton”, “the former president”} (“Bill Clinton”:“president”) • S+ and S- : Set of seed pairs derived from the positive and the negative training instances
Acquiring Patterns • A seed pair could belong to S+ can S- at the same time? • For each of the seed NP pairs (Ei : Ej ) • Search a large corpus for the strings • Match the regular expression “Ei * * * Ej” or “Ej * * * Ei” • For each retrieved string • Extract a surface pattern by replacing expression Ei with a mark <#t1#> and Ej with <#t2#>. • If the string is followed by a symbol, the symbol will be also included in the pattern.
(Bill Clinton : president) (S1) “Bill Clinton is elected President of the United States.” (S2) “The US President, Mr. Bill Clinton, today advised India to move towards nuclear nonproliferation and begin a dialogue with Pakistan to ...” • Patterns are • P1: <#t1#> is elected <#t2#> • P2: <#t2#> , Mr <#t1#> • |(Ei , p, Ej )| = number of strings matched by a pattern p instantiated with (Ei :Ej ) • Reference patterns = All the patterns derived from the positive seed pairs
Scoring Patterns • Frequency • Freqency(p) = |{s|s ∈ S+, p ∈ P List(s)} • Reliability • Pointwise mutual information (pmi) • pmi(x, y) = log P (x, y)/ P (x)P (y) • (PMI) between pattern p and a (+) seed pair • pmi(p, (Ei : Ej )) = log |(Ei,p,Ej )| / |(∗,∗,∗)| |(Ei,∗,Ej )| |(∗,p,∗)| |(∗,∗,∗)| |(∗,∗,∗)|
Pattern Features • Directly use the reference patterns as a set of features • select the most effective patterns • rank the patterns according to their • scores and then choose the top patterns • if a pattern also occurs frequently with (-) seed pairs • may lead to many false positive pairs during resolution. • filter the patterns based on their accuracy
Semantic Relatedness Feature • Single feature to reflect reliability that a NP pair is related in semantics. • Only reference patterns among PList(Ei:Ej ) are involved in the feature computing. • SRel(i{NPi, NPj}) = 1000 ∗ ∑p∈P List(Ei:Ej ) pmi(p, (Ei : Ej )) ∗ r(p) • pmi(p, (Ei : Ej )) is the PMI • r(p) is the reliability score of p
Experimental setup • ACE-2 V1.0 corpus (NIST, 2003) • newswire (NWire), newspaper (NPaper), and broadcast news (BNews) • pattern extraction and feature computing, we used Wikipedia (220 Million words). • Raw text preprocessed by NLP pipeline • Sentence boundary detection, POS-tagging, Text Chunking and Named-Entity Recognition • Two different classifiers were learned respectively for resolving pronouns and non-pronouns. • Pattern based semantic information was only applied to the non-pronoun resolution
Pattern Features • Evaluated only based on frequency • Top patterns appositive structure • “X, an/a/the Y” • leads to the lowest precision. • Filtered by accuracy • Top patterns with both high frequency and high accuracy are those for the copula structure • “X is/was/are Y” • yields the highest precision with the lowest recall • Low accuracy features prone to false positives eliminated • PMI Reliability: • appositive and copula structures • highest recall with a medium level of precision
Observations • Pattern features only work well for NP pairs containing proper names • error analysis shows that a • non-anaphor is often wrongly resolved to a false antecedent once the two NPs happen to satisfy a pattern feature, which affects precision largely • Patterned based semantic information seems more effective in the NWire domain than the other two