1 / 78

TEXT PROCESSING 1

TEXT PROCESSING 1. Anaphora resolution Introduction to Anaphora Resolution. Outline. A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions A brief history of anaphora resolution First algorithms: Charniak, Winograd, Wilks Pronouns: Hobbs

marlow
Download Presentation

TEXT PROCESSING 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TEXT PROCESSING 1 Anaphora resolution Introduction to Anaphora Resolution

  2. Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Early ML work • Definite descriptions: Vieira & Poesio • The MUC initiative – also: coreference evaluation methods • Soon et al

  3. Anaphora resolution: a specification of the problem

  4. Anaphora resolution:coreference chains

  5. Reminder: Factors that affect the interpretation of anaphoric expressions • Factors: • Morphological features (agreement) • Syntactic information • Salience • Lexical and commonsense knowledge • Distinction often made between CONSTRAINTS and PREFERENCES

  6. Agreement • GENDER strong CONSTRAINT for pronouns (in other languages: for other anaphors as well) • [Jane] blamed [Bill] because HE spilt the coffee (Ehrlich, Garnham e.a, Arnold e.a) • NUMBER also strong constraint • [[Union] representatives] told [the CEO] that THEY couldn’t be reached

  7. Some complexities • Gender: • [India] withdrew HER ambassador from the Commonwealth • “…to get a customer’s1100 parcel-a-week load to its doorstep” • [actual error from LRC algorithm] • Number: • The Union said that THEY would withdraw from negotations until further notice.

  8. Syntactic information • BINDING constraints • [John] likes HIM • EMBEDDING constraints • [[his] friend] • PARALLELISM preferences • Around 60% of pronouns occur in subject position; around 70% of those refer to antecedents in subject position • [John] gave [Bill] a book, and [Fred] gave HIM a pencil • Effect of syntax on SALIENCE (Next)

  9. Salience • In every discourse, certain entities are more PROMINENT

  10. Factors that affect prominence • Distance • Order of mention in the sentence • Entities mentioned earlier in the sentence more prominent • Type of NP (proper names > other types of NPs) • Number of mentions • Syntactic position (subj > other GF, matrix > embedded) • Semantic role (‘implicit causality’ theories) • Discourse structure

  11. Focusing theories • Hypothesis: One or more entities in the discourse are the FOCUS OF (LINGUISTIC) ATTENTION just like some entities in the visual space are the focus of VISUAL attention • Grosz 1977, Reichman 1985: ‘focus spaces’ • Sidner 1979, Sanford & Garrod 1981: ‘focused entities’ • Grosz et al 1981, 1983, 1995: Centering

  12. Lexical and commonsense knowledge • [The city council] refused [the women] a permit because they feared violence. • [The city council] refused [the women] a permit because they advocated violence. • Winograd (1974), Sidner (1979) BRISBANE – a terrific right rip from [Hector Thompson] dropped [Ross Eadie] at Sandgate on Friday night and won him the Australian welterweight boxing title. (Hirst, 1981)

  13. Problems to be resolved by an AR system: mention identification • Effect: recall • Typical problems: • Nested NPs (possessives) • [a city] 's [computer system]  [[a city]’s computer system] • Appositions: • [Madras], [India]  [Madras, [India]] • Attachments

  14. Problems for AR:Complex attachments • [The quality that’s coming out of [software from [India]] • The quality that’s coming out of software from India is now exceeding the quality of software that’s coming out from the United States • scanning through millions of lines of computer code • ACE/bnews/devel/ABC19981001.1830.1257

  15. Problems for AR: agreement extraction • The committee are meeting / is meeting • The Union sent a representative. They …. • The doctor came to visit my father. SHE told him …

  16. Problems to be solved: anaphoricity determination • Expletives: • IT’s not easy to find a solution • Is THERE any reason to be optimistic at all? • Non-anaphoric definites

  17. Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Early ML work • Definite descriptions: Vieira & Poesio • The MUC initiative, coreference evaluation methods • Soon et al 2001

  18. A brief history of early work on anaphora resolution • Algorithms heavily based on (hand-coded) commonsense knowledge: Charniak 72, Winograd 74, Hobbs 1974, Wilks 1975 • Syntax-based algorithms: Hobbs 1974 • Salience-based algorithms: Brennan et al, Strube & Hahn, Tetreault

  19. First approaches to anaphora resolution: Charniak, Winograd, Wilks • Heavy emphasis on inference • Hand-coded • No formal evaluation

  20. 1970s Example: SHRDLUhttp://hci.stanford.edu/~winograd/shrdlu/ Terry Winograd. 1971. MIT Ph.D. Thesis. Person: PICK UP A BIG RED BLOCK. Computer: OK. (does it) Person: GRASP THE PYRAMID. Computer: I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX. Computer: BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. Computer: OK. (does it) Person: WHAT DOES THE BOX CONTAIN? Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. Person: WHAT IS THE PYRAMID SUPPORTED BY? Computer: THE BOX. Person: HOW MANY BLOCKS ARE NOT IN THE BOX? Computer: FOUR OF THEM. Person: IS AT LEAST ONE OF THEM NARROWER THAN THE ONE WHICH I TOLD YOU TO PICK UP? Computer: YES, THE RED CUBE. Terry Winograd • This is one of the only dialogs it knows. • SHRDLU is too stupid • to make mistakes. • Beautiful “Demo-ware”

  21. Anaphora in SHRDLU • First example of HISTORY LIST algorithm • Uses a combination of agreement features and semantic constraints • Check all possibilities and assign a global score rather than simply find the first match • Score incorporates syn component: entities in subj position higher score than entities in object position, in turn ranked more highly than entities in adjunct position • Performance made more impressive by including solutions to a number of complex cases, such as reference to events (Why did you do it?) – often ad hoc

  22. Hobbs’ `Naïve Algorithm’ (Hobbs, 1974) • The reference algorithm for PRONOUN resolution (until Soon et al it was the standard baseline) • Interesting since Hobbs himself in the 1974 paper suggests that this algorithm is very limited (and proposes one based on semantics) • The first anaphora resolution algorithm to have an (informal) evaluation • Purely syntax based

  23. Hobbs’ `Naïve Algorithm’ (Hobbs, 1974) • Works off ‘surface parse tree’ • Starting from the position of the pronoun in the surface tree, • first go up the tree looking for an antecedent in the current sentence (left-to-right, breadth-first); • then go to the previous sentence, again traversing left-to-right, breadth-first. • And keep going back

  24. X p Hobbs’ algorithm: Intrasentential anaphora • Steps 2 and 3 deal with intrasentential anaphora and incorporate basic syntactic constraints: • Also: John’s portrait of him S NPJohn Vlikes NPhim

  25. S candidate X S NPBill NPa good friend Vis NPJohn Vlikes NPhim Hobbs’ Algorithm: intersentential anaphora

  26. Evaluation • The first anaphora resolution algorithm to be evaluated in a systematic manner, and still often used as baseline (hard to beat!) • Hobbs, 1974: • 300 pronouns from texts in three different styles (a fiction book, a non-fiction book, a magazine) • Results: 88.3% correct without selectional constraints, 91.7% with SR • 132 ambiguous pronouns; 98 correctly resolved. • Tetreault 2001 (no selectional restrictions; all pronouns) • 1298 out of 1500 pronouns from 195 NYT articles (76.8% correct) • 74.2% correct intra, 82% inter • Main limitations • Reference to propositions excluded • Plurals • Reference to events

  27. Salience-based algorithms • Common hypotheses: • Entities in discourse model are RANKED by salience • Salience gets continuously updated • Most highly ranked entities are preferred antecedents • Variants: • DISCRETE theories (Sidner, Brennan et al, Strube & Hahn): 1-2 entities singled out • CONTINUOUS theories (Alshawi, Lappin & Leass, Strube 1998, LRC): only ranking

  28. Salience-based algorithms • Sidner 1979: • Most extensive theory of the influence of salience on several types of anaphors • Two FOCI: discourse focus, agent focus • never properly evaluated • Brennan et al 1987 (see Walker 1989) • Ranking based on grammatical function • One focus (CB) • Strube & Hahn 1999 • Ranking based on information status (NP type) • S-List (Strube 1998): drop CB • LRC (Tetreault): incremental

  29. LRC • An update on Strube’s S-LIST algorithm (= Centering without centers) • Initial version augmented with various syntactic and discourse constraints

  30. LRC Algorithm (LRC) • Maintain a stack of entities ranked by grammatical function and sentence order (Subj > DO > IO) • Each sentence is represented by a Cf-list: list of salient entities ordered by grammatical function • While processing utterance’s entities (left to right) do: • Push entity onto temporary list (Cf-list-new), if pronoun, attempt to resolve first: • Search through Cf-list-new (l-to-r) taking the first candidate that meets gender, agreement constraints, etc. • If none found, search past utterance’s Cf-lists starting from previous utterance to beginning of discourse

  31. Results

  32. Comparison with ML techniques of the time

  33. Carter’s Algorithm (1985) • The most systematic attempt to produce a system using both salience (Sidner’s theory) & commonsense knowledge (Wilks’s preference semantics) • Small-scale evaluation (around 100 hand-constructed examples) • Many ideas found their way in the SRI’s Core Language Engine (Alshawi, 1992)

  34. Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Modern work in anaphora resolution • Early ML work • The MUC initiative – also: coreference evaluation methods • Soon et al

  35. MODERN WORK IN ANAPHORA RESOLUTION • Availability of the first anaphorically annotated corpora circa 1993 (MUC6) made statistical methods possible

  36. STATISTICAL APPROACHES TO ANAPHORA RESOLUTION • UNSUPERVISED approaches • Eg Cardie & Wagstaff 1999, Ng 2008 • SUPERVISED approaches • Early (NP type specific) • Soon et al: general classifier + modern architecture

  37. ANAPHORA RESOLUTION AS A CLASSIFICATION PROBLEM • Classify NP1 and NP2 as coreferential or not • Build a complete coreferential chain

  38. SOME KEY DECISIONS • ENCODING • I.e., what positive and negative instances to generate from the annotated corpus • Eg treat all elements of the coref chain as positive instances, everything else as negative: • DECODING • How to use the classifier to choose an antecedent • Some options: ‘sequential’ (stop at the first positive), ‘parallel’ (compare several options)

  39. Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Modern work in anaphora resolution • Early ML work • The MUC initiative – also: coreference evaluation methods • Soon et al

  40. Early machine-learning approaches • Main distinguishing feature: concentrate on a single NP type • Both hand-coded and ML: • Aone & Bennett (pronouns) • Vieira & Poesio (definite descriptions) • Ge and Charniak (pronouns)

  41. Definite descriptions:Vieira & Poesio • A first attempt at going beyond pronouns while still doing a large-scale evaluation • Definite descriptions chosen because they require lexical and commonsense knowledge • Developing both a hand-coded and a ML decision tree (as in Aone and Bennett) • Vieira & Poesio 1996, Vieira 1998, Vieira & Poesio 2000

  42. Preliminary corpus study (Poesio and Vieira, 1998) Annotators asked to classify about 1,000 definite descriptions from the ACL/DCI corpus (Wall Street Journal texts) into three classes: • DIRECT ANAPHORA: a house … the house • DISCOURSE-NEW: the belief that ginseng tastes like spinach is more widespread than one would expect • BRIDGING DESCRIPTIONS:the flat … the living room; the car … the vehicle

  43. Poesio and Vieira, 1998 • Results: • More than half of the def descriptions are first-mention • Subjects didn’t always agree on the classification of an antecedent (bridging descriptions: ~8%)

  44. The Vieira / Poesio system for robust definite description resolution • Follows a SHALLOW PROCESSING approach (Carter, 1987; Mitkov, 1998): it only uses • Structural information (extracted from Penn Treebank) • Existing lexical sources (WordNet) • (Very little) hand-coded information

  45. Methods for resolving direct anaphors • DIRECT ANAPHORA: • the red car, the car, the blue car:premodification heuristics • segmentation: approximated with ‘loose’ windows

  46. Methods for resolving discourse-new definite descriptions • DISCOURSE-NEW DEFINITES • the first man on the Moon, the fact that Ginseng tastes of spinach: a list of the most common functional predicates (fact, result, belief) and modifiers (first, last, only… ) • heuristics based on structural information (e.g., establishing relative clauses)

  47. The (hand-coded) decision tree • Apply ‘safe’ discourse-new recognition heuristics • Attempt to resolve as same-head anaphora • Attempt to classify as discourse new • Attempt to resolve as bridging description. Search backward 1 sentence at a time and apply heuristics in the following order: • Named entity recognition heuristics – R=.66, P=.95 • Heuristics for identifying compound nouns acting as anchors – R=.36 • Access WordNet – R, P about .28

  48. The decision tree obtained via ML • Same features as for the hand-coded decision tree • Using ID3 classifier (non probabilistic decision tree) • Training instances: • Positive: closest annotated antecedent • Negative: all mentions in previous four sentences • Decoding: consider all possible antecedents in previous four sentences

  49. Automatically learned decision tree

  50. Overall Results • Evaluated on a test corpus of 464 definite descriptions • Overall results:

More Related