1 / 83

Learning noun phrase coreference resolution

Learning noun phrase coreference resolution. Memory-based vs. rule-induction. Veronique Hoste CNTS Language Technology Group University of Antwerp. Part I. Machine Learning of. Coreference Resolution. Introduction. Coreference and the task of coreference resolution

mason-rose
Download Presentation

Learning noun phrase coreference resolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning noun phrase coreference resolution Memory-based vs. rule-induction Veronique Hoste CNTS Language Technology Group University of Antwerp

  2. Part I Machine Learning of Coreference Resolution

  3. Introduction • Coreference and the task of coreference resolution • Applications of coreference resolution • Machine Translation • Information extraction • Question answering • (…)

  4. ANAPHOR Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity. (Hirst, 81)

  5. ANTECEDENT or REFERENT ANAPHOR Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity. (Hirst, 81)

  6. ANTECEDENT or REFERENT ANAPHOR Anaphora is the device of making in discourse an abbreviated reference to some entityin the expectation that the perceiver will we able to disabbreviate the reference and thereby determine the identity of the entity. RESOLUTION

  7. In 1983 werden Alfred Heineken en zijn chauffeur ontvoerd. De kidnappers vroegen 43 miljoen gulden losgeld. Een bescheiden bedrag, vonden ze zelf.

  8. In 1983 werden Alfred Heineken en zijn chauffeur ontvoerd. De kidnappers vroegen 43 miljoen gulden losgeld. Een bescheiden bedrag, vonden ze zelf.

  9. In 1983 werden Alfred Heineken en zijn chauffeur ontvoerd. De kidnappers vroegen 43 miljoen gulden losgeld. Een bescheiden bedrag, vonden ze zelf.

  10. Morphological and lexical knowledge Real-world knowledge Syntactic knowledge Anaphora resolution Semantic knowledge Discourse knowledge Coreference resolution, a complex problem

  11. Morphological and lexical knowledge Alfred Heineken: 3p singular, male, proper zijn: 3p singular, male, pron. Alfred Heineken en zijn chauffeur: 3p plural, common De kidnappers: 3p plural, common Ze : 3p plural, pronoun In 1983 werden Alfred Heineken en zijnchauffeur ontvoerd. De kidnappers vroegen 43 miljoen gulden losgeld. Een bescheiden bedrag, vonden zezelf.

  12. Morphological and lexical knowledge Alfred Heineken: 3p singular, male, proper zijn: 3p singular, male, pron. Alfred Heineken en zijn chauffeur: 3p plural, common De kidnappers: 3p plural, common Ze : 3p plural, pronoun In 1983 werden Alfred Heineken en zijnchauffeur ontvoerd. De kidnappers vroegen 43 miljoen gulden losgeld. Een bescheiden bedrag, vonden zezelf.

  13. Semantic knowledge • Synonym recognition: • Donderdag gaven Stevaert en Picque elkaar de schuld voor het disfunctioneren van twee onbemande camera’s. Picque - bevoegd voor de erkenning van de flitspalen (…) • Hyponym recognition: • Zacarias Moussaoui is aangeklaagd voor de terreuraanvallen van 11 september. Hij kon door omstandigheden niet aan de kapingen deelnemen.

  14. Anaphora resolution can be considered one of the most difficult problems in natural language processing Why bother? Crucial importance in many text mining applications

  15. Introduction • Coreference and the task of coreference resolution • Applications of coreference resolution • Machine Translation • Information extraction • Question answering • (…)

  16. The lack of efficient coreference resolution ... … is a weakness in existing machine translation systems • The monkey ate the banana because it was hungry. • The monkey ate the banana because it was ripe. • The monkey ate the banana because it was lunch-time.

  17. The monkey ate the banana because it was hungry. De aap at de banaan omdat hij honger had. The monkey ate the banana because it was ripe. De aap at de banaan omdat ze rijp was. The monkey ate the banana because it was lunch-time. De aap at de banaan omdat het etenstijd was.

  18. Who: ….. What: ….. Where: ….. When: ….. How: ….. Information extraction The lack of efficient coreference resolution … … is a weakness in existing information extraction systems

  19. De woordvoerder van het vorstenhuis meldde gisteren dat de koning in allerijl is opgenomen in het ziekenhuis. Deze ochtend bevestigden de dokters dat zijn toestand stabiel is. Who: ….. De koning What: ….. In allerijl opgenomen Toestand stabiel

  20. Approaches • The field is still highly knowledge-based (constraints and preferences; centering and focusing theory), e.g. Lappin & Leass (1994), Baldwin (1996), Poesio et al. (2004) • Recently: machine learning (C4.5, Ripper, Maximum entropy) in which coreference resolution is defined as a classification task E.g. De Verenigde staten probeerden van [Pakistan en India] de belofte af te dwingen dat [ze] geen kernwapens zouden inzetten. [ze] - [de belofte] not coreferential [ze] - [Pakistan en India] coreferential [ze] - [De Verenigde Staten] not coreferential

  21. Part II Machine Learning of Coreference Resolution

  22. Strategy • Annotate data • Create positive and negative instances for the training data and train model on training data • Create test instances and evaluate learned model on testing data

  23. Building blocks for a machine learning experiment • Data: corpora annotated with coreferential information • A machine learning algorithm

  24. Building blocks for a machine learning experiment • Data: corpora annotated with coreferential information • A machine learning algorithm

  25. There’s no data like more data • (Banko and Brill, 2001)

  26. Corpora annotated with coreferential links • ENGLISH: MUC-6 (2141/2091 corefs) / MUC-7 (2569/1728 corefs) • Extensively used for evaluation • Articles from WSJ and NYT • DUTCH: KNACK-2002 • First Dutch coreferentially annotated corpus • Articles from KNACK 2002 on different topics: politics, science, culture, … • ca. 12,500 annotated NPs

  27. Which anaphora? • Identity, bound, ISA (identity of sense), modality relations <-> part-whole relation: “If the gas tank is empty, you should refuel the car.” • Between NPs • Personal, possessive and demonstrative pronouns • Non lexicalized reflexive pronouns • Names and named entities • Definite NPs

  28. Example (KNACK-2002) (…) In de praktijk is er van autonomie of vrijheid in de beide Kashmirs geen sprake, want zezijn sinds jaar en dag de twistappel tussen Pakistan en India. Die twee landen onstonden in 1947 om een conflict tussen moslims en hindoes te vermijden. (…) De Verenigde staten probeerden vruchteloos van Pakistan en India de belofte af te dwingen dat ze geen kernwapens zouden inzetten. Dat leidde zelfs tot economische sancties tegen beide landen.

  29. Example (KNACK-2002) Zacarias Moussaoui, de eerste persoon die door het Amerikaanse gerecht aangeklaagd is voor de terreuraanvallen van 11 september, pleit onschuldig bij zijn eerste verschijning voor de rechtbank. De Fransman van Marokkaanse afkomst wordt ervan verdacht de ‘twintigste vliegtuigkaper’ te zijn die door omstandigheden (hij zat in een Amerikaanse cel) niet aan de kapingen kon deelnemen.

  30. <COREF ID=“1”>Zacarias Moussaoui</COREF>, <COREF ID=“2” TYPE=“IDENT” REF=“1” MIN=“persoon”>de eerste persoon die door het Amerikaanse gerecht aangeklaagd is voor <COREF ID=“3” MIN=“terreuraanvallen”>de terreuraanvallen van 11 september</COREF></COREF>, pleit onschuldig bij <COREF ID=“4” TYPE=“IDENT” REF=“1”>zijn</COREF> eerste verschijning voor de rechtbank. <COREF ID=“5” TYPE=“IDENT” REF=“1” MIN=“Fransman”>De Fransman van Marokkaanse afkomst</COREF> wordt ervan verdacht <COREF ID=“6” TYPE=“MOD” REF=“5” MIN=“vliegtuigkaper”>de ‘twintigste vliegtuigkaper’</COREF> te zijn die door omstandigheden (<COREF ID=“6” TYPE=“IDENT” REF=“5”>hij</COREF> zat in een Amerikaanse cel) niet aan <COREF ID=“7” TYPE=“IDENT” REF=“3”>de kapingen</COREF> kon deelnemen.

  31. <COREF ID=“1”>Zacarias Moussaoui</COREF>, <COREF ID=“2” TYPE=“IDENT” REF=“1” MIN=“persoon”>de eerste persoon die door het Amerikaanse gerecht aangeklaagd is voor <COREF ID=“3” MIN=“terreuraanvallen”>de terreuraanvallen van 11 september</COREF></COREF>, pleit onschuldig bij <COREF ID=“4” TYPE=“IDENT” REF=“1”>zijn</COREF> eerste verschijning voor de rechtbank. <COREF ID=“5” TYPE=“IDENT” REF=“1” MIN=“Fransman”>De Fransman van Marokkaanse afkomst</COREF> wordt ervan verdacht <COREF ID=“6” TYPE=“MOD” REF=“5” MIN=“vliegtuigkaper”>de ‘twintigste vliegtuigkaper’</COREF> te zijn die door omstandigheden (<COREF ID=“6” TYPE=“IDENT” REF=“5”>hij</COREF> zat in een Amerikaanse cel) niet aan <COREF ID=“7” TYPE=“IDENT” REF=“3”>de kapingen</COREF> kon deelnemen.

  32. These annotated corpora are used as ... • …training data to train and validate the machine learner • Procedure: n-fold cross-validation • partition the training data in n parts • repeat n times: take each part as test set and train on the remaining other parts • … hold-out test data to test the resulting learner

  33. Free text Tokenization POS tagging NP chunking NER Relation finding Instance construction Preprocessing

  34. Free text Tokenization POS tagging NP chunking NER Relation finding Instance construction Google is een beursgenoteerd bedrijf. Google is een beursgenoteerd bedrijf. N(eigen) V(pv,tgw,ev) LID(onbep) ADJ N(soort) . [NP Google] [VP is] [NP een beursgenoteerd bedrijf]. I-ORG [SBJ Google] is [PREDC een beursgenoteerd bedrijf] .

  35. Positive and negative instances • Positive: combination of the anaphor with each preceding element in the coreference chain. • Negative: combination of the anaphor with each preceding NP which is not part of the coreference chain (search scope: <= 20 sentences) • Problem of skewedness.

  36. In 1983 werden Alfred Heineken en zijn chauffeur ontvoerd. De kidnappers vroegen 43 miljoen gulden losgeld. Een bescheiden bedrag, vonden ze zelf. ze - een bescheiden bedrag NEG ze - 43 miljoen gulden losgeld NEG ze - de kidnappers POS ze - Alfred Heineken en zijn chauffeur NEG ze - zijn chauffeur NEG ze - Alfred Heineken NEG ze - 1983 NEG ZE zijn - Afred Heineken POS zijn - 1983 NEG ZIJN

  37. Features in an instance • Positional features (eg. dist_sent, dist_NP) • Local context features • Morphological and lexical features (e.g. i/j/ij-pron, j_demon, j_def, i/j/ij-proper, num_agree) • Syntactic features (e.g. i/j/ij_SBJ/OBJ/PREDC, appositive) • String-matching features (comp_match, part_match, alias, same_head) • Semantic features (synonym, hyperonym, same_NE, (linguistic) gender of antecedent and anaphor)

  38. ze - de kidnappers Feature Value Feature Value Dist_sent 0 i_SBJ/OBJ/PRED i_SBJ Dist_NP 1 j_SBJ/OBJ/PRED j_SBJ i_pron i_pron appositive no j_pron no comp_match no ij_pron no part_match no j_demon no alias no j_def j_def same_head no i_proper no synonym no j_proper no hypernym no ij_proper no same_NE no num_agree yes (…)

  39. Two step procedure • First step: cross-validation • Application of learning algorithm on training set; 10-fold-cv • Search for informative features, optimal learning parameters • Undersampling of the negative class • Second step: testing • Reconstruction of coreference chains

  40. Building blocks for a machine learning experiment • Data: corpora annotated with coreferential information • A machine learning algorithm

  41. The choice of learning method:the importance of algorithm bias • = the search heuristics a certain machine learning method uses and the way it represents the learned knowledge E.g. decision tree learners favor compact decision trees • No free lunch theorem (Wolpert and Macready 95) = no inductive algorithm is universally better than any other

  42. Memory-based learning • Background: performance in real-world tasks is based on remembering past events rather than creating rules or generalizations • Lazy (vs. eager) : MBL keeps all training data in memory and only abstracts at classification time by extrapolating a class from the most similar items in memory to the new test item • demo

  43. MBL components • memory-based learning component: During learning, the learning component adds new training instances to the memory without any abstraction or restructuring • similarity-based performance component: The classification of the most similar instance in memory is taken as classification for the new test instance

  44. In other words ... • Given (x1, y1), (x2, y2), (x3, y3), …. (xn, yn) • Task at classification time is to find the closest xi for a new data point xq

  45. Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours

  46. Crucial components • A distance metric • The number of nearest neighbours to look at • A strategy of how to extrapolate from the nearest neighbours

  47. Distance metrics When presenting a new instance for classification to the MBL learner, the learner looks in its memory in order to find all instances whose attributes are similarto the newly presented test instance.

  48. Distance metrics • How far are xi and xq? • Most basic metric: Overlap Metric • looks at the number of matching and mismatching feature values in two instances • (xq,xi) = ni=1 (xqi,xii) • where •  (xqi,xii) = 0 if xqi = xii •  (xqi,xii) = 1 if xqi  xii

  49. Feature weighting • Problem: some features will be more informative for the prediction of the class label than others. The overlap metric in case of classification with many uninformative features and few informative features will strongly hinder performance • Solution: feature selection or feature weighting • information gain weighting • gain ratio weighting • chi-squared weighting

  50. Information gain weighting • H(C) = -  cC P(c) log2 P (c) • the uncertainty of the learner about which • class to predict for an instance • wi = H(C) -  vVi P(v) x H(C|v) • the information gain of a feature: difference in entropy • between situations with and without • information about the values of that feature • Problem: features with many possible values are favoured above features with fewer possible values

More Related