780 likes | 960 Views
TEXT PROCESSING 1. Anaphora resolution Introduction to Anaphora Resolution. Outline. A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions A brief history of anaphora resolution First algorithms: Charniak, Winograd, Wilks Pronouns: Hobbs
E N D
TEXT PROCESSING 1 Anaphora resolution Introduction to Anaphora Resolution
Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Early ML work • Definite descriptions: Vieira & Poesio • The MUC initiative – also: coreference evaluation methods • Soon et al
Reminder: Factors that affect the interpretation of anaphoric expressions • Factors: • Morphological features (agreement) • Syntactic information • Salience • Lexical and commonsense knowledge • Distinction often made between CONSTRAINTS and PREFERENCES
Agreement • GENDER strong CONSTRAINT for pronouns (in other languages: for other anaphors as well) • [Jane] blamed [Bill] because HE spilt the coffee (Ehrlich, Garnham e.a, Arnold e.a) • NUMBER also strong constraint • [[Union] representatives] told [the CEO] that THEY couldn’t be reached
Some complexities • Gender: • [India] withdrew HER ambassador from the Commonwealth • “…to get a customer’s1100 parcel-a-week load to its doorstep” • [actual error from LRC algorithm] • Number: • The Union said that THEY would withdraw from negotations until further notice.
Syntactic information • BINDING constraints • [John] likes HIM • EMBEDDING constraints • [[his] friend] • PARALLELISM preferences • Around 60% of pronouns occur in subject position; around 70% of those refer to antecedents in subject position • [John] gave [Bill] a book, and [Fred] gave HIM a pencil • Effect of syntax on SALIENCE (Next)
Salience • In every discourse, certain entities are more PROMINENT
Factors that affect prominence • Distance • Order of mention in the sentence • Entities mentioned earlier in the sentence more prominent • Type of NP (proper names > other types of NPs) • Number of mentions • Syntactic position (subj > other GF, matrix > embedded) • Semantic role (‘implicit causality’ theories) • Discourse structure
Focusing theories • Hypothesis: One or more entities in the discourse are the FOCUS OF (LINGUISTIC) ATTENTION just like some entities in the visual space are the focus of VISUAL attention • Grosz 1977, Reichman 1985: ‘focus spaces’ • Sidner 1979, Sanford & Garrod 1981: ‘focused entities’ • Grosz et al 1981, 1983, 1995: Centering
Lexical and commonsense knowledge • [The city council] refused [the women] a permit because they feared violence. • [The city council] refused [the women] a permit because they advocated violence. • Winograd (1974), Sidner (1979) BRISBANE – a terrific right rip from [Hector Thompson] dropped [Ross Eadie] at Sandgate on Friday night and won him the Australian welterweight boxing title. (Hirst, 1981)
Problems to be resolved by an AR system: mention identification • Effect: recall • Typical problems: • Nested NPs (possessives) • [a city] 's [computer system] [[a city]’s computer system] • Appositions: • [Madras], [India] [Madras, [India]] • Attachments
Problems for AR:Complex attachments • [The quality that’s coming out of [software from [India]] • The quality that’s coming out of software from India is now exceeding the quality of software that’s coming out from the United States • scanning through millions of lines of computer code • ACE/bnews/devel/ABC19981001.1830.1257
Problems for AR: agreement extraction • The committee are meeting / is meeting • The Union sent a representative. They …. • The doctor came to visit my father. SHE told him …
Problems to be solved: anaphoricity determination • Expletives: • IT’s not easy to find a solution • Is THERE any reason to be optimistic at all? • Non-anaphoric definites
Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Early ML work • Definite descriptions: Vieira & Poesio • The MUC initiative, coreference evaluation methods • Soon et al 2001
A brief history of early work on anaphora resolution • Algorithms heavily based on (hand-coded) commonsense knowledge: Charniak 72, Winograd 74, Hobbs 1974, Wilks 1975 • Syntax-based algorithms: Hobbs 1974 • Salience-based algorithms: Brennan et al, Strube & Hahn, Tetreault
First approaches to anaphora resolution: Charniak, Winograd, Wilks • Heavy emphasis on inference • Hand-coded • No formal evaluation
1970s Example: SHRDLUhttp://hci.stanford.edu/~winograd/shrdlu/ Terry Winograd. 1971. MIT Ph.D. Thesis. Person: PICK UP A BIG RED BLOCK. Computer: OK. (does it) Person: GRASP THE PYRAMID. Computer: I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX. Computer: BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. Computer: OK. (does it) Person: WHAT DOES THE BOX CONTAIN? Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. Person: WHAT IS THE PYRAMID SUPPORTED BY? Computer: THE BOX. Person: HOW MANY BLOCKS ARE NOT IN THE BOX? Computer: FOUR OF THEM. Person: IS AT LEAST ONE OF THEM NARROWER THAN THE ONE WHICH I TOLD YOU TO PICK UP? Computer: YES, THE RED CUBE. Terry Winograd • This is one of the only dialogs it knows. • SHRDLU is too stupid • to make mistakes. • Beautiful “Demo-ware”
Anaphora in SHRDLU • First example of HISTORY LIST algorithm • Uses a combination of agreement features and semantic constraints • Check all possibilities and assign a global score rather than simply find the first match • Score incorporates syn component: entities in subj position higher score than entities in object position, in turn ranked more highly than entities in adjunct position • Performance made more impressive by including solutions to a number of complex cases, such as reference to events (Why did you do it?) – often ad hoc
Hobbs’ `Naïve Algorithm’ (Hobbs, 1974) • The reference algorithm for PRONOUN resolution (until Soon et al it was the standard baseline) • Interesting since Hobbs himself in the 1974 paper suggests that this algorithm is very limited (and proposes one based on semantics) • The first anaphora resolution algorithm to have an (informal) evaluation • Purely syntax based
Hobbs’ `Naïve Algorithm’ (Hobbs, 1974) • Works off ‘surface parse tree’ • Starting from the position of the pronoun in the surface tree, • first go up the tree looking for an antecedent in the current sentence (left-to-right, breadth-first); • then go to the previous sentence, again traversing left-to-right, breadth-first. • And keep going back
X p Hobbs’ algorithm: Intrasentential anaphora • Steps 2 and 3 deal with intrasentential anaphora and incorporate basic syntactic constraints: • Also: John’s portrait of him S NPJohn Vlikes NPhim
S candidate X S NPBill NPa good friend Vis NPJohn Vlikes NPhim Hobbs’ Algorithm: intersentential anaphora
Evaluation • The first anaphora resolution algorithm to be evaluated in a systematic manner, and still often used as baseline (hard to beat!) • Hobbs, 1974: • 300 pronouns from texts in three different styles (a fiction book, a non-fiction book, a magazine) • Results: 88.3% correct without selectional constraints, 91.7% with SR • 132 ambiguous pronouns; 98 correctly resolved. • Tetreault 2001 (no selectional restrictions; all pronouns) • 1298 out of 1500 pronouns from 195 NYT articles (76.8% correct) • 74.2% correct intra, 82% inter • Main limitations • Reference to propositions excluded • Plurals • Reference to events
Salience-based algorithms • Common hypotheses: • Entities in discourse model are RANKED by salience • Salience gets continuously updated • Most highly ranked entities are preferred antecedents • Variants: • DISCRETE theories (Sidner, Brennan et al, Strube & Hahn): 1-2 entities singled out • CONTINUOUS theories (Alshawi, Lappin & Leass, Strube 1998, LRC): only ranking
Salience-based algorithms • Sidner 1979: • Most extensive theory of the influence of salience on several types of anaphors • Two FOCI: discourse focus, agent focus • never properly evaluated • Brennan et al 1987 (see Walker 1989) • Ranking based on grammatical function • One focus (CB) • Strube & Hahn 1999 • Ranking based on information status (NP type) • S-List (Strube 1998): drop CB • LRC (Tetreault): incremental
LRC • An update on Strube’s S-LIST algorithm (= Centering without centers) • Initial version augmented with various syntactic and discourse constraints
LRC Algorithm (LRC) • Maintain a stack of entities ranked by grammatical function and sentence order (Subj > DO > IO) • Each sentence is represented by a Cf-list: list of salient entities ordered by grammatical function • While processing utterance’s entities (left to right) do: • Push entity onto temporary list (Cf-list-new), if pronoun, attempt to resolve first: • Search through Cf-list-new (l-to-r) taking the first candidate that meets gender, agreement constraints, etc. • If none found, search past utterance’s Cf-lists starting from previous utterance to beginning of discourse
Carter’s Algorithm (1985) • The most systematic attempt to produce a system using both salience (Sidner’s theory) & commonsense knowledge (Wilks’s preference semantics) • Small-scale evaluation (around 100 hand-constructed examples) • Many ideas found their way in the SRI’s Core Language Engine (Alshawi, 1992)
Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Modern work in anaphora resolution • Early ML work • The MUC initiative – also: coreference evaluation methods • Soon et al
MODERN WORK IN ANAPHORA RESOLUTION • Availability of the first anaphorically annotated corpora circa 1993 (MUC6) made statistical methods possible
STATISTICAL APPROACHES TO ANAPHORA RESOLUTION • UNSUPERVISED approaches • Eg Cardie & Wagstaff 1999, Ng 2008 • SUPERVISED approaches • Early (NP type specific) • Soon et al: general classifier + modern architecture
ANAPHORA RESOLUTION AS A CLASSIFICATION PROBLEM • Classify NP1 and NP2 as coreferential or not • Build a complete coreferential chain
SOME KEY DECISIONS • ENCODING • I.e., what positive and negative instances to generate from the annotated corpus • Eg treat all elements of the coref chain as positive instances, everything else as negative: • DECODING • How to use the classifier to choose an antecedent • Some options: ‘sequential’ (stop at the first positive), ‘parallel’ (compare several options)
Outline • A reminder: anaphora resolution, factors affecting the interpretation of anaphoric expressions • A brief history of anaphora resolution • First algorithms: Charniak, Winograd, Wilks • Pronouns: Hobbs • Salience: S-List, LRC • Modern work in anaphora resolution • Early ML work • The MUC initiative – also: coreference evaluation methods • Soon et al
Early machine-learning approaches • Main distinguishing feature: concentrate on a single NP type • Both hand-coded and ML: • Aone & Bennett (pronouns) • Vieira & Poesio (definite descriptions) • Ge and Charniak (pronouns)
Definite descriptions:Vieira & Poesio • A first attempt at going beyond pronouns while still doing a large-scale evaluation • Definite descriptions chosen because they require lexical and commonsense knowledge • Developing both a hand-coded and a ML decision tree (as in Aone and Bennett) • Vieira & Poesio 1996, Vieira 1998, Vieira & Poesio 2000
Preliminary corpus study (Poesio and Vieira, 1998) Annotators asked to classify about 1,000 definite descriptions from the ACL/DCI corpus (Wall Street Journal texts) into three classes: • DIRECT ANAPHORA: a house … the house • DISCOURSE-NEW: the belief that ginseng tastes like spinach is more widespread than one would expect • BRIDGING DESCRIPTIONS:the flat … the living room; the car … the vehicle
Poesio and Vieira, 1998 • Results: • More than half of the def descriptions are first-mention • Subjects didn’t always agree on the classification of an antecedent (bridging descriptions: ~8%)
The Vieira / Poesio system for robust definite description resolution • Follows a SHALLOW PROCESSING approach (Carter, 1987; Mitkov, 1998): it only uses • Structural information (extracted from Penn Treebank) • Existing lexical sources (WordNet) • (Very little) hand-coded information
Methods for resolving direct anaphors • DIRECT ANAPHORA: • the red car, the car, the blue car:premodification heuristics • segmentation: approximated with ‘loose’ windows
Methods for resolving discourse-new definite descriptions • DISCOURSE-NEW DEFINITES • the first man on the Moon, the fact that Ginseng tastes of spinach: a list of the most common functional predicates (fact, result, belief) and modifiers (first, last, only… ) • heuristics based on structural information (e.g., establishing relative clauses)
The (hand-coded) decision tree • Apply ‘safe’ discourse-new recognition heuristics • Attempt to resolve as same-head anaphora • Attempt to classify as discourse new • Attempt to resolve as bridging description. Search backward 1 sentence at a time and apply heuristics in the following order: • Named entity recognition heuristics – R=.66, P=.95 • Heuristics for identifying compound nouns acting as anchors – R=.36 • Access WordNet – R, P about .28
The decision tree obtained via ML • Same features as for the hand-coded decision tree • Using ID3 classifier (non probabilistic decision tree) • Training instances: • Positive: closest annotated antecedent • Negative: all mentions in previous four sentences • Decoding: consider all possible antecedents in previous four sentences
Overall Results • Evaluated on a test corpus of 464 definite descriptions • Overall results: