1 / 76

EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE

EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE. Massimo Poesio Universit à di Trento and University of Essex. Vilem Mathesius Lectures Praha, 2007. Plan of the series. Wednesday: Annotating context dependence, and particularly anaphora

tamera
Download Presentation

EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EMPIRICAL INVESTIGATIONS OF ANAPHORA AND SALIENCE Massimo PoesioUniversità di Trento and University of Essex Vilem Mathesius Lectures Praha, 2007

  2. Plan of the series • Wednesday: Annotating context dependence, and particularly anaphora • Yesterday: Using anaphorically annotated corpora to investigate local & global salience • Today: Using anaphorically annotated corpora to investigate anaphora resolution

  3. Today’s lecture • The Vieira / Poesio work on robust definite description resolution • Bridging references • Discourse-new • (If time allows):Task-oriented evaluation

  4. Massimo Poesio: Add better examples (e.g., from The book of evidence) Preliminary corpus study (Poesio and Vieira, 1998) Annotators asked to classify about 1,000 definite descriptions from the ACL/DCI corpus (Wall Street Journal texts) into three classes: • DIRECT ANAPHORA: a house … the house • DISCOURSE-NEW: the belief that ginseng tastes like spinach is more widespread than one would expect • BRIDGING DESCRIPTIONS:the flat … the living room; the car … the vehicle

  5. Poesio and Vieira, 1998 • Results: • More than half of the def descriptions are first-mention • Subjects didn’t always agree on the classification of an antecedent (bridging descriptions: ~8%)

  6. The Vieira / Poesio system for robust definite description resolution • Follows a SHALLOW PROCESSING approach (Carter, 1987; Mitkov, 1998): it only uses • Structural information (extracted from Penn Treebank) • Existing lexical sources (WordNet) • (Very little) hand-coded information (Vieira & Poesio, 1996 / Vieira, 1998 / Vieira & Poesio, 2001)

  7. Methods for resolving direct anaphors • DIRECT ANAPHORA: • the red car, the car, the blue car:premodification heuristics • segmentation: approximated with ‘loose’ windows

  8. Methods for resolving discourse-new definite descriptions • DISCOURSE-NEW DEFINITES • the first man on the Moon, the fact that Ginseng tastes of spinach: a list of the most common functional predicates (fact, result, belief) and modifiers (first, last, only… ) • heuristics based on structural information (e.g., establishing relative clauses)

  9. A `knowledge-based’ classification of bridging descriptions (Vieira, 1998) • Based on LEXICAL RELATIONS such as synonymy, hyponymy, and meronimy, available from a lexical resource such as WordNetthe flat … the living room • The antecedent is introduced by a PROPER NAMEBach … the composer • The anchor is a NOMINAL MODIFIER introduced as part of the description of a discourse entity:selling discount packages … the discounts

  10. … continued (cases NOT attempted by our system) • The anchor is introduced by a VP:Kadane oil is currently drilling two oil wells. The activity… • The anchor is not explicitly mentioned in the text, but is a `discourse topic’the industry (in a text about oil companies) • The resolution depends on more general commonsense knowledgelast week’s earthquake … thesuffering people

  11. Distribution of bridging descriptions

  12. The (hand-coded) decision tree • Apply ‘safe’ discourse-new recognition heuristics • Attempt to resolve as same-head anaphora • Attempt to classify as discourse new • Attempt to resolve as bridging description. Search backward 1 sentence at a time and apply heuristics in the following order: • Named entity recognition heuristics – R=.66, P=.95 • Heuristics for identifying compound nouns acting as anchors – R=.36 • Access WordNet – R, P about .28

  13. Overall Results • Evaluated on a ‘test corpus’ of 464 definite descriptions • Overall results:

  14. Overall Results • Results for each type of definite description:

  15. Questions raised by the Vieira / Poesio work • Do these results hold for larger datasets? • Do discourse-new detectors help? • Bridging: • How to define the phenomenon? • Where to get the information? • How to combine salience with lexical & commonsense knowledge? • Can such a system be helpful for applications?

  16. Mereological bridging references Cartonnier (Filing Cabinet) with Clock This piece of mid-eighteenth-century furniture was meant to be used like a modern filing cabinet; papers were placed in leather-fronted cardboard boxes (now missing) that were fitted into the openshelves. A large table decorated in the same manner would have been placed in front for working with those papers. Access to the cartonnier's lower half can only be gained by the doors at the sides, because the table would have blocked the front.

  17. PREVIOUS RESULTS • A series of experiments using the Poesio / Vieira dataset, containing 204 bridging references, including 39 `WordNet’ bridges • (Vieira and Poesio, 2000, but also Carter 1985, Hobbs - a number of papers-, etc) need lexical knowledge • But: even large lexical resources such as WordNet not enough, particularly for mereological references (Poesio et al, 1997; Vieira and Poesio, 2000; Poesio, 2003; Garcia-Almanza, 2003) • Partial solution: use lexical acquisition (HAL, Hearst-style construction method). Best results (for mereology): construction-style

  18. FINDING MERONYMICAL RELATIONS USING SYNTACTIC INFORMATION • Some syntactic constructions suggest semantic relations • (Cfr. Hearst 1992, 1998 for hyponyms) • Ishikawa 1998, Poesio et al 2002: use syntactic constructions to extract mereological information from corpora • The WINDOW of the CAR • The CAR’s WINDOW • The CAR WINDOW • See also Berland & Charniak 1999, Girju et al 2002

  19. LEXICAL RESOURCES FOR BRIDGING: A SUMMARY (All using the Vieira / Poesio dataset.)

  20. FOCUSING AND MEREOLOGICAL BRIDGES Cartonnier (Filing Cabinet) with Clock This piece of mid-eighteenth-century furniture was meant to be used like a modern filing cabinet; papers were placed in leather-fronted cardboard boxes (now missing) that were fitted into the openshelves. A large table decorated in the same manner would have been placed in front for working with those papers. Access to the cartonnier's lower half can only be gained by the doors at the sides, because the table would have blocked the front. (See Sidner, 1979; Markert et al, 1995.)

  21. FOCUS (CB) TRACKING + GOOGLE SEARCH (POESIO, 2003) • Analyzed 169 associative BDs in GNOME corpus (58 mereology) • Correlation between distance and focusing (Poesio et al, 2004) and choice of anchor • 77.5% anchor same or previous sentence; 95.8% in last five sentences • CB(U-1) anchor for only 33.6% of BDs, • but 89% of anchors had been CB or CP • Using `Google distance’ to choose among salient anchor candidates

  22. FINDING MEREOLOGICAL RELATIONS USING GOOGLE • Lexical vicinity measure (for MERONYMS) between NBD and NPA • Search in Google for “the NBD of the NPA” (cfr. Ishikawa, 1998; Poesio et al, 2002) • E.g., “the drawer of the cabinet” • Choose as anchor the PA whose NPA results in the greater number of hits • Preliminary results for associative BDs: around 70% P/R (by hand) • See also: Markert et al, 2003, 2005; Modjeska et al, 2003

  23. NEW EXPERIMENTS (Poesio et al, 2004) • Using the GNOME corpus • 58 mereological bridging refs realized by the-nps • 153 mereological bridging references in total • Reliably annotated • Completely automatic feature extraction • Google & WordNet for lexical distance • Using (an approximation of) salience • Using machine learning to combine the features

  24. More (and reliably annotated) data: the GNOME corpus • Texts from 3 genres (museum descriptions, pharmaceutical leaflets, tutorial dialogues) • Reliably annotated syntactic, semantic and discourse information • grammatical function, agreement features • anaphoric relations • uniqueness, ontological information, animacy, genericity, … • Reliable annotation of bridging references • http://cswww.essex.ac.uk/Research/NLE/corpora/GNOME

  25. METHODS • Salience features: • Utterance distance • First mention • ‘Global first mention’ (approximate CB) • Lexical distance: • WordNet (using a pure hypernym-based search strategy) • Google • Tried both separately and together • Statistical classifiers: MLP, Naïve Bayes • (MatLab / Weka ML Library)

  26. Lexical Distance 1 (WordNet) • Computing WordNet Distance: • Get the head noun of the anaphor and find all the (noun) senses for the head noun. • Get all the noun senses for the head noun of the potential antecedent under consideration. • Retrieve the hypernym trees from WordNet for each sense of anaphor and the antecedent. • Traverse each unique path in these trees and find a common parent for the anaphor and the antecedent; count the no. of nodes they are apart. • Select the least distance path across all combinations. • If no common parent is found, assign an hypothetical distance (30).

  27. Lexical Distance, 1: WordNet

  28. Lexical Distance 2 (Google) • As in (Poesio, 2003) • But use Google API to access the Google search engine • Computing Google hits: • Get the head noun for BR and potential candidate. • Check whether the potential candidate is a mass or count noun. • If count, build the query as “the body of theperson” and search for the pattern. • Retrieve the no. of Google hits

  29. WN vs GOOGLE

  30. BASELINES

  31. RESULTS (58 THE-NPs, 50:50)

  32. MORE RESULTS 1:3 dataset: all 153 mereological BRs:

  33. MEREOLOGICAL BDS REALIZED WITH BARE-NPS The combination of rare and expensive materials used on this cabinet indicates that it was a particularly expensive commission. The four Japanese lacquer panels date from the mid- to late 1600s and were created with a technique known as kijimaki-e. For this type of lacquer, artisans sanded plain wood to heighten its strong grain and used it as the background of each panel. They then added the scenic elements of landscape, plants, and animals in raised lacquer. Although this technique was common in Japan, such large panels were rarely incorporated into French eighteenth-century furniture. Heavy Ionic pilasters, whose copper-filled flutes give an added rich color and contrast to the gilt-bronze mounts, flank the panels. Yellow jasper, a semiprecious stone, rather than the usual marble, forms the top.

  34. HARDER TEST Using classifiers trained on balanced /slightly unbalanced data (the-nps) on unbalanced ones (10-fold cross validation)

  35. DISCUSSION • Previous results: • Construction-based techniques provide adequate lexical resources, particularly when using Web as corpus • But need to combine lexical knowledge and salience modeling • This work: • Combining (simple) salience with lexical resources results in significant improvements • Future work: • Larger dataset • Better approximation of focusing

  36. Back to discourse-new detection • The GUITAR system • Recent results

  37. GUITAR (Kabadjov, to appear) • A robust, usable anaphora resolution system designed to work as part of an XML pipeline • Incorporates: • Pronouns: the Mitkov algorithm • Definite descriptions: the Vieira / Poesio algorithm • Proper nouns: the Bontcheva alg. • Several versions • Version 1: (Poesio & Kabadjov, 2004): direct anaphora • Version 2: DN detection • Version 3: proper name resolution • Freely available fromhttp://privatewww.essex.ac.uk/~malexa/GuiTAR/

  38. DISCOURSE-NEW DEFINITE DESCRIPTIONS (1) Toni Johnson pulls a tape measure acrossthe front of what was once a stately Victorian home. (2)The Federal Communications Commissionallowed American Telephone & Telegraph Co. to continue offering discount phone services for large-business customers and said it would soon re-examine its regulation ofthe long-distance market. Poesio and Vieira (1998): about 66% of definite descriptions in their texts (WSJ) are discourse-new

  39. WOULD DNEW RECOGNITION HELP? First version of GUITAR without DN detection on subset of DDs in GNOME corpus - 574 DDs, of which - 184 anaphoric (32%)- 390 discourse-new (67.9%)

  40. SPURIOUS MATCHES If your doctor has told you in detail HOW MUCH to use and HOW OFTEN then keep to this advice. ….. If you are not sure then follow the advice on the back of this leaflet.

  41. GOALS OF THE WORK • Vieira and Poesio’s (2000) system incorporated DISCOURSE-NEW DD DETECTORS (P=69, R=72, F=70.5) • Two subsequent strands of work: • Bean and Riloff (1999), Uryupina (2003) developed improved detectors (e.g., Uryupina: F=86.9) • Ng and Cardie (2002) questioned whether such detectors improve results • Our project: systematic investigation of whether DN detectors actually help • ACL 04 ref res: features, preliminary results • THIS WORK: results of further experiments

  42. DN CLASSIFIER:THE UPPER BOUND • Current number of SMs: 52/198 (26.3%) • If SM = 0, P=R=F overall = 509/574 = 88.7 • (P=R=F on anaphora only: 119/146= 81.5)

  43. VIEIRA AND POESIO’S DN DETECTORS Recognize SEMANTICALLY FUNCTIONAL descriptions: SPECIAL PREDICATES / PREDICATE MODIFIERS (HAND-CODED)the front of what was once a stately Victorian homethe best chance of saving the youngest children PROPER NAMES.the Federal Communications Commission … LARGER SITUATION descriptions (HAND-CODED):the City, the sun, ….

  44. VIEIRA AND POESIO’S DN DETECTORS, II PREDICATIVE descriptions: COPULAR CLAUSES: he is the hardworking son of a Church of Scotland minister …. APPOSITIONS.Peter Kenyon, the Chelsea chief executive … Descriptions ESTABLISHED by modification:The warlords and private militias who were once regarded as the West’s staunchest allies are now a greater threat to the country’s security than the Taliban …. (Guardian, July 13th 2004, p.10)

  45. VIEIRA AND POESIO’S DECISION TREES Tried both hand-coded and ML Hand-coded decision tree: 1. Try the DN detectors with highest accuracy (attempt to classify as functional using special predicates, and as predicative by looking for apposition) 2. Attempt to resolve the DD as direct anaphora 3. Try other DN detectors in order: proper name, establishing clauses, proper name modification …. ML DT: swap 1. and 2.

  46. VIEIRA AND POESIO’S RESULTS

  47. BEAN AND RILOFF (1999) Developed a system for identifying DN definites Adopted syntactic heuristics from Vieira and Poesio, and developed several new techniques: SENTENCE-ONE (S1) EXTRACTION identify as discourse-new every description found in first sentence of a text. DEFINITE PROBABILITY create a list of nominal groups encountered at least 5 times with definite article, but never with indefinite VACCINES: block heuristics when prob. too low.

  48. BEAN AND RILOFF’S ALGORITHM 1. If the head noun appeared earlier, classify as anaphoric 2. If DD occurs in S1 list, classify as DN unless vaccine 3. Classify DD as DN if one of the following applies: (a) high definite probability; (b) matches a EHP pattern; (c) matches one of the syntactic heuristics 4. Classify as anaphoric

  49. BEAN AND RILOFF’S RESULTS

  50. NG AND CARDIE (2002) • Directly investigate question of whether discourse-new detectors improves performance of anaphora resolution system • Dealing with ALL types of anaphoric expressions

More Related