450 likes | 603 Views
Memory-based learning for noun phrase coreference resolution. Veronique Hoste. Outline. Noun phrase coreference resolution Definition Why? Problems A memory-based learning approach. Definition (Hirst, 81).
E N D
Memory-based learning for noun phrase coreference resolution Veronique Hoste
Outline • Noun phrase coreference resolution • Definition • Why? • Problems • A memory-based learning approach
Definition (Hirst, 81) Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity.
Definition (Hirst, 81) ANAPHOR Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity.
Definition (Hirst, 81) ANTECEDENTor REFERENT Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity. ANAPHOR
Definition (Hirst, 81) ANTECEDENTor REFERENT ANAPHOR Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity. RESOLUTION
Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.
Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.
Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her/ her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.
Example Kim Clijstershas won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her/her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.
Example Kim Clijsters has won the Proximus Diamond Games in Antwerp. Belgium’s world number two secured her first title on home soil by making short work of defaiting Italy’s Silvia Farina Elia. Clijsters broke Farina Elia’s second service game but her opponent broke back immediately and it wasn’t until the eight game that the Belgian broke again to lead 5-3, from which she served out to take the set. It was Clijsters’s sixth victory over the Italian.
Why? Weakness in existing IE systems Who: ….. What: ….. Where: ….. When: ….. How: ….. Information extraction
Morphological and lexical knowledge Real-world knowledge Syntactic knowledge Anaphora resolution Semantic knowledge Discourse knowledge Coreference resolution, a complex problem
Approaches • The past: mostly knowledge-based techniques (constraints and preferences) e.g. Lappin & Leass (1994), Baldwin (CogNIAC, 1996) • Recently: machine learning (C4.5) Redefine coreference resolution as a CLASSIFICATION task.
A classification based approach • Given two entities in a text, NP1 and NP2, classify the pair as coreferent of not coreferent. • E.g. • [Clijsters] broke [[Farina Elia]’s second service game] but [[her] opponent] broke back immediately. [her opponent] - [Farina Elia’s second service game] coref? - [Farina Elia] coref? - [Clijsters] coref?
Free text Tokenization POS tagging NP chunking NER Nested NP extraction GETTING STARTED
Learner ingredients • Starting point: corpora annotated with coreferential chains • “About one month ago <COREF ID=“1”>American Airlines</COREF> sent <COREF ID=“2”> a delegation</COREF> to Brussels. <COREF ID=“3” TYPE=“IDENT” REF=“1”> The large air plane company </COREF> was interested in DAT and wished to discuss this interest with <COREF ID=“4”>the prime minister</COREF>. But <COREF ID=“5” TYPE=“IDENT” REF=“4”>Guy Verhofstadt</COREF> refused to see <COREF ID=“6” REF=“2”>the delegation</COREF>.”
Two data sets • ENGLISH: MUC-6 (2141/2091 corefs) and MUC-7 (2569/1728 corefs) • The only datasets which are publicly available • Extensively used for evaluation • Articles from WSJ and NYT • DUTCH: KNACK-2002 • First Dutch coreferentially annotated corpus • Articles from KNACK 2002 on different topics: politics, science, culture, …
Learner ingredients (ctd) • Training data to train and validate the machine learner • Procedure: n-fold cross-validation • partition the training data in n parts • repeat n times: take each part as test set and train on the remaining other parts • Hold-out test data to test the resulting learner
Learner ingredients (ctd) • Creating instances • One instance for each pair of NPs • At the end of the instance: class values (both NPs are coreferential, not coreferential). E.g. [Clijsters] broke [[Farina Elia]’s second service game] but [[her] opponent] broke back immediately. [her opponent] - [Farina Elia’s second service game] not coreferential [her opponent] - [Farina Elia] coreferential [her opponent] - [Clijsters] not coreferential
Learner ingredients (ctd) • Instance: describes the characteristics of two NPs and their context • Features per instance: • local context: words + POS • string matching features (complete match, partial match) E.g. president Bush, George W. Bush • grammatical: - pronoun, demonstrative, definite, proper noun
Features (ctd) • grammatical (ctd): • number, gender • appositive • subject/object • semantic: • synonym, hypernym • alias • same named entity? • Distance in number of sentences and NPs
Task Build a small instance base for the following sentences. • work from right to left • link every NP (the potential anaphor) to all its preceding NPs (the candidate antecedents) • build for each pair a vector with the following features • feature 1+2: gender • feature 3+4: number • feature 5: exact match (binary) • feature 6: partial match (binary) • feature 7+8: pronoun/demonstrative/definite/proper • feature 9: synonyms/hypernyms (binary)
“About one month ago <COREF ID=“1”>American Airlines</COREF> sent <COREF ID=“2”> a delegation</COREF> to Brussels. <COREF ID=“3” TYPE=“IDENT” REF=“1”> The large air plane company </COREF> was interested in DAT and wished to discuss this interest with <COREF ID=“4”> prime minister Verhofstadt </COREF>. But <COREF ID=“5” TYPE=“IDENT” REF=“4”>Guy Verhofstadt</COREF> refused to see <COREF ID=“6” REF=“2”>the delegation</COREF>.”
Resulting instance base • the delegation - prime minister Verhofstadt • the delegation - this interest • the delegation - DAT • the delegation - the large airplane company • the delegation - Brussels • the delegation - a delegation • Guy Verhofstadt - prime minister Verhofstadt • (…) NP pairs Neutral, person, singular, singular, no, no, definite, proper, no, nocoref Neutral, person, singular, singular, yes, yes, definite, indefinite, yes, coref Male, person, singular, singular, no, yes, proper, proper, yes, coref (…) Feed these instances to the learning algorithm
Learning • TRAINING: • Input : set of training instances • Output: a coreference classifier • TESTING: • Input : new unseen instances • Output: classification
Memory-based learning • Background: performance in real-world tasks is based on remembering past events rather than creating rules or generalizations • Lazy (vs. eager) : MBL keeps all training data in memory and only abstracts at classification time by extrapolating a class from the most similar items in memory to the new test item
MBL components • memory-based learning component: During learning, the learning component adds new training instances to the memory without any abstraction or restructuring • similarity-based performance component: The classification of the most similar instance in memory is taken as classification for the new test instance
In other words ... • Given (x1, y1), (x2, y2), (x3, y3), …. (xn, yn) • Task at classification time is to find the closest xi for a new data point xq
Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours
Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours
Distance metrics When presenting a new instance for classification to the MBL learner, the learner looks in its memory in order to find all instances whose attributes are similarto the newly presented test instance.
Distance metrics • How far are xi and xq? • Most basic metric: Overlap Metric (xq,xi) = ni=1 (xqi,xii) where (xqi,xii) = 0 if xqi = xii (xqi,xii) = 1 if xqi xii
Feature weighting • Problem: some features will be more informative for the prediction of the class label than others • Solution: feature selection or feature weighting • information gain weighting • gain ratio weighting • chi-squared weighting
Information gain weighting • Expresses the average entropy reduction from a feature when its value is known H(C) = - cC P(c) log2 P(c) wi = H(C) - vVi P(v) x H(C|v) Problem: features with many possible values are favoured above features with fewer possible values
Gain ratio weighting • Normalized version of information gain • = information gain divided by the entropy of the feature values wi = H(C) - vVi P(v) x H(C|v) si(i) si(i) = - vVi P(v) log2 P(v)
Chi-squared weighting • Given: contingency table consisting of all classes and feature values • Chi square: measures the difference between the expected values and the observed values in each of the cells of the table (Eij -Oij)2 2 = ij Eij n.j ni. Eij= n..
Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours
k • Nearest neighbours: the instances in memory which are near to the test item to be classified • The classification of these nearest neighbours is used as classification for the new test instance • Expressed byk • k = 1 : the instances with the nearest distance to the test instance are used for classification
Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours
Extrapolation from the nearest neighbours • Goal: decide which will be the class of a new test item • Approaches: • Majority voting: all nearest neighbours receive equal weight • Distance weighted voting: link the choice of classification to the distance between the nearest neighbours and the test item
Noise How will MBL handle many uninformative features?
Skewedness E.g. 10% coreferential instances and 90% noncoreferential instances Does MBL suffer from skewed class distributions?