540 likes | 557 Views
This research explores the challenges of resolving bridging descriptions in natural language processing, focusing on the lack of lexical and commonsense knowledge. The study aims to improve the performance of the system by developing better annotation methods, unsupervised lexical acquisition, and syntactic constructions for meronymy.
E N D
Bridging Descriptions, Lexical Information, and Focusing Massimo Poesio University of Essex (Thanks to my many past and present collaborators: Abdulrahman Almuhareb, Chris Brew, Tomonori Ishikawa, Will Lowe, Axel Maroudas, Scott McDonald, Sabine Schulte im Walde, Renata Vieira.) COLI
Outline/Punchline • Starting point: the robust system for resolving definite descriptions in (Poesio and Vieira, 1998, 2000) • Using heuristics and accessing WordNet • Large-scale evaluation • Weaker aspect of the system: bridging descriptions • Difficult to annotate reliably • Main problem: lack of lexical and commonsense knowledge • Need to keep track of salient entities (‘focusing’?) COLI Saarbruecken
Subsequent work: • We have been working to improve the performance of the system developing • Better annotation methods (MATE, GNOME) • Unsupervised methods for lexical acquisition • Basic vector-space methods for synonymy (Poesio, Schulte im Walde, and Brew, 1998) • Syntactic constructions for meronymy (Poesio et al, LREC 2002) • Focus-tracking methods (Poesio et al, 2000, to appear) COLI Saarbruecken
Massimo Poesio: Add better examples (e.g., from The book of evidence) Preliminary corpus study (Poesio and Vieira, 1998) Annotators asked to classify about 1,000 definite descriptions from the ACL/DCI corpus (Wall Street Journal texts) into three classes: • DIRECT ANAPHORA: a house … the house • DISCOURSE-NEW: the belief that ginseng tastes like spinach is more widespread than one would expect • BRIDGING DESCRIPTIONS:the flat … the living room; the car … the vehicle • Results: • About half of the def descriptions are first-mention • Subjects didn’t always agree on the classification of an antecedent (bridging descriptions: ~8%) COLI Saarbruecken
Disagreements on bridging descriptions About 160 workers at a factory that made paper for the Kent filters were exposed to asbestos in the 1950s. Areas of the factory were particularly dusty where the crocidolite was used. Workers dumped large burlap sacks of the imported material into a huge bin, poured in cotton and acetate fibers and mechanically mixed the dry fibers in a process used to make filters. Workers described "clouds of blue dust" that hung over parts of the factory, even though exhaust fans ventilated the area. COLI Saarbruecken
A `knowledge-based’ classification of bridging descriptions (Vieira, 1998) • Based on LEXICAL RELATIONS such as synonymy, hyponymy, and meronimy, available from a lexical resource such as WordNetthe flat … the living room • The antecedent is introduced by a PROPER NAMEBach … the composer • The anchor is a NOMINAL MODIFIER introduced as part of the description of a discourse entity:selling discount packages … the discounts • The anchor is introduced by a VP:Kadane oil is currently drilling two oil wells. The activity… COLI Saarbruecken
… continued • The anchor is not explicitly mentioned in the text, but is a `discourse topic’the industry (in a text about oil companies) • The resolution depends on more general commonsense knowledgelast week’s earthquake … thesuffering people COLI Saarbruecken
Distribution of bridging descriptions COLI Saarbruecken
Our first system for definite description resolution (Poesio and Vieira 1996, 2000a, 2000b) • Follows a SHALLOW PROCESSING approach (Carter, 1987; Mitkov, 1998): it only uses • Structural information (extracted from Penn Treebank) • Existing lexical sources (WordNet) • (Very little) hand-coded information COLI Saarbruecken
Overall Results (Vieira and Poesio, 2000) • Evaluated on a ‘test corpus’ of 464 definite descriptions • Overall results: • Results for each type of definite description: COLI Saarbruecken
Per-class results using Wordnet COLI Saarbruecken
The problems with WordNet • Words not in WordNet: • Crocidolite, spinoff (spin-off) • Context-dependent senses: • slump, crash, bust all synonyms in the WSJ corpus • The structure of WordNet • Some information is encoded in complex ways (room, wall, floor) • Can increase recall by using more complex search algorithm, but precision goes down very fast (Poesio, Vieira, and Teufel, 1997; Vieira, 1998) • (Similar problems reported by Gaizauskas et al) COLI Saarbruecken
The case of HOUSE ARTIFACT IS-A IS-A HOUSING BUILDING IS-A IS-A PART-OF HOUSE HOME ROOM PART-OF PART-OF WALL FLOOR COLI Saarbruecken
Subsequent work: • We have been working to improve the performance of the system developing • Better annotation methods (MATE, GNOME) • Unsupervised methods for lexical acquisition • Basic vector-space methods for synonymy • Syntactic constructions for meronymy • Focus-tracking methods (Poesio et al, 2000, to appear) COLI Saarbruecken
Acquiring Lexical Knowledge • The main problem with the system discussed above is the lack of lexical knowledge, commonsense information, and commonsense inference need to resolve bridging references • A possibility: extend WordNet either by hand (e.g., Harabagiu,1997) or automatically (Hearst,1992,1998; Caraballo, 1999; various people at LREC 2002 and DAARC 2002) • Our approach: try to acquire this information completely automatically • Vector-based methods for synonymy (Poesio, Schulte im Walde, and Brew, 1998) • Why? Lund, Burgess, et al (1995, 1997): lexical associations learned this way correlate very well with priming experiments • Using syntactic collocations to acquire meronymy (Ishikawa, 1998) COLI Saarbruecken
Vector-based lexical semantics CAT DOG WHALE COLI Saarbruecken
Per-class results using vector semantics COLI Saarbruecken
Wordnet vs vector overall COLI Saarbruecken
Results for bridging descriptions using vector-based semantics • Not very good overall (22.7%) or even just on the ‘WordNet bridges’ (22.2%) • BUT: for synonymy, results comparable to those obtained with WordNet (36%) • And every additional 50M of training data seem to result in about 50% increase in accuracy • AND a by-cases analysis suggests that problem is not only the lexical acquisition method: COLI Saarbruecken
Problems with vector semantics • The role of focus: • Investigation company … Pinkerton … the company • Commonsense inference: • Well, notes IRS private ruling 8934014, “a fundamental principle” is that income must be taxed to whoever earns it. The rule goes back at least as far as a 1930 Supreme Court decision, Robert Willens of Shearson Lehman Hutton says. COLI Saarbruecken
Wordnet vs vector on ’Wordnet Categories’ COLI Saarbruecken
Third approach: Meronymy using Syntactic Information • Some constructions suggest relations – e.g., part-of • The WINDOW of the CAR • The CAR’s WINDOW • The CAR WINDOW • Idea: when computing co-occurrences for nouns A and B, only count those that occur in one such construction • Can identify more constructions automatically (cfr. Hearst, 1998) • Approach (Ishikawa, 1998): • Train over BNC again • Evaluate over same set of bridging descriptions as previously used • When trying to resolve an inferential description, choose the one with the highest value of MUTUAL INFORMATION: • I(X;Y) = log P(x,y) / P(x)P(y) (Cfr. Hearst 1998 for hyponyms; Xu et al, Alfonseca (LREC 2002)) COLI Saarbruecken
Per-class results using syntactic info COLI Saarbruecken
Wordnet vs vector vs. syntax COLI Saarbruecken
The multiple LKB hypothesis • The results with meronymy suggest that with the lexical databases acquired with these methods we can more than double the precision / recall achieved with WordNet • Similar techniques can be used to acquire hyponymy (see Hearst) • Hypothesis: we need to acquire 3 knowledge bases: • Vector-based lexical meanings for synonymy • Co-occurrence information for hyponymy and meronymy • But we also need to improve methods for searching for the anchor • Taking salience into account • While trying to classify the description COLI Saarbruecken
Subsequent work: • We have been working to improve the performance of the system developing • Better annotation methods (MATE, GNOME) • Unsupervised methods for lexical acquisition • Basic vector-space methods for synonymy • Syntactic constructions for meronymy • Focus-tracking methods (Poesio et al, 2000, to appear) COLI Saarbruecken
Improving the evaluation method • Crucial prerequisite: better annotation techniques • Reliable way of marking up bridging references • Misclassification problems: allow for entities to be marked as BOTH anaphoric and bridging • Ways of marking up visual deixis • Also need reliable corpus to compare results with others • The MATE scheme (Poesio, Bruneseaux, and Romary, 1999) • Further refined in the GNOME scheme (Poesio, LREC 2000) • Complete manual with reliability results • In particular, reliable way of marking up bridging references • Simpler method for marking up visual deixis • Discourse deixis (Poesio and Modjeska, 2002) • Putting the scheme into practice: the GNOME corpus COLI Saarbruecken
Marking bridging references Each coffer also has a lid that opens in two sections. The upper lid reveals a shallow compartment while the main lid lifts to reveal the interior of the coffer The 1689 inventory of the Grand Dauphin, the oldest son of Louis XIV, lists a jewel coffer of similar form and decoration; according to the inventory, Andre’ Charles Boulle made the coffer. The two stands are of the same date as the coffers, but were originally designed to hold rectangular cabinets. COLI Saarbruecken
The MATE scheme • XML-based • Text elements that introduce discourse entities and anaphoric expressions tagged as NE elements <NE ID=“ne07”>Scottish-born, Canadian based jeweller, Alison Bailey-Smith</NE> <NE ID=“ne08”><NE ID=“ne09”>Her</NE>materials</NE> • Relations between elements specified by means of separate ANTE elements: <ANTE CURRENT=“ne09” REL=“ident”> <ANCHOR ANTECEDENT=“ne07” /> </ANTE> • We can mark up • multiple anaphoric relations • visual deixis • ambiguity COLI Saarbruecken
The GNOME scheme • Much more detailed annotation instructions • http://www.hcrc.ed.ac.uk/~poesio/GNOME/anno_manual_4.html • Reliability results • In particular, reliable way of marking up bridging references • Replace ‘shared knowledge’ approach to larger situation definites with `functionality’ (cfr. Loebner, 1987) • Additional elements: UNITs, layout elements, modifier info • Additional attributes for NEs (semantic and syntactic features) • VISUAL-DEIXIS as a NE feature • DISCOURSE-DEIXIS as a NE feature COLI Saarbruecken
Marking bridging references: achieving agreement (but not completeness) • RESTRICTING THE NUMBER OF RELATIONS • IDENT (covers direct anaphora, hyponymy, and hypernymy) • ELEMENT • SUBSET • Generalized POSSession • OTHER (when no other connection with previous unit) • Keeping the decision of what counts as a bridge out of the annotator’s hands – annotators only worry about identifying semantic relations • RESULTS (2 annotators, anaphoric relations for 200 NPs) • Only 4.8% disagreements • But 73.17% of relations marked by only one annotator • Result: only ask annotators to mark all identity and AT LEAST ONE bridging COLI Saarbruecken
Marking bridging references Each coffer also has a lid that opens in two sections. The upper lid reveals a shallow compartment while the main lid lifts to reveal the interior of the coffer The 1689 inventory of the Grand Dauphin, the oldest son of Louis XIV, lists a jewel coffer of similar form and decoration; according to the inventory, Andre’ Charles Boulle made the coffer. The two stands are of the same date as the coffers, but were originally designed to hold rectangular cabinets. COLI Saarbruecken
The GNOME corpus • Genres (about 3000 NPs in each genre) • Descriptions of museum pages (including the ILEX/SOLE corpus) • ICONOCLAST corpus (500 pharmaceutical leaflets) • Tutorial dialogues from the SHERLOCK corpus • Current annotation status: • Complete annotation for layout, discourse units, and NP markup (including a few grammatical features) • About 1500 NPs from each genre annotated for anaphoric information • About 1,000 NPs in the museum and pharmacy corpus annotated according to latest scheme for NP form (in revision) • About 1000 NPs in the museum domain annotated for modification • Sherlock corpus completely annotated for rhetorical relations, RDA-style COLI Saarbruecken
Anaphoric relations in the GNOME corpus: COLI Saarbruecken
Subsequent work: • We have been working to improve the performance of the system developing • Better annotation methods (MATE, GNOME) • Unsupervised methods for lexical acquisition • Basic vector-space methods for synonymy • Syntactic constructions for meronymy • Exploiting salience -tracking methods (Poesio et al, 2000, to appear) COLI Saarbruecken
Local salience and bridging descriptions • Often argued that `salience’ and `focus’ play an important role in the interpretation of anaphoric expressions. Explicit algorithms using focal information to resolve bridging descriptions developed by Sidner (1979) • The Poesio / Vieira system did not include methods for focus tracking • Only useful for pronouns? • Cfr. Azzam et al’s 1998 negative results • As a result of the work on bridging references, started studying Centering Theory, and developed computational methods for automatically tracking the CB (Poesio et al 2000, submitted) • We are currently using these methods to tackle the problem `from the other direction’ COLI Saarbruecken
A corpus-based investigation of Centering Theory (Poesio et al 2000, submitted) • Goal of the work: compare several proposals for setting the ‘parameters’ of the theory (UTTERANCE, RANKING, REALIZE), identify which one ‘works best’, and evaluate it • Results: • With `Vanilla’ version (utterances = finite clauses, ranking = grammatical function, only direct realization) less than half of utterances have a CB (‘Constraint 1’) • Two ways of guaranteeing that most utterances have a CB: • Identify utterance with sentences • Allow for indirect realization (i.e., a bridging reference counts as a realization of its anchor in the present utterance) • ‘Grammatical Function’ and ‘Strube & Hahn’ ranking equivalent for Constraint 1 and pronominalization; Strube and Hahn only version in which CONTINUE > RETAIN > SHIFT (but ZERO, NULL, and ESTABLISHMENT the most common transitions in all versions) COLI Saarbruecken
Distance statistics for bridging descriptions (utterances = sentences) COLI Saarbruecken
BDs and order of mention • Anchor position for the 72 BDs whose anchor is in the previous utterance: COLI Saarbruecken
BDs and CB/CP • Correlation with current CB: • But: 98/110 anchors, 89%, have been CBs or CPs COLI Saarbruecken
Examples Anchor = CB & CP (NOT first mention) The decoration on this monumental cabinet refers to the French king’s Louis XIV’s military victories. A panel of marquetry showing the cockerel of France standing triumphant over both the eagle of the Roman Empire and the lion of Spain decorates the central door while the main lid lifts to reveal the interior of the coffer Anchor = CB but NOT CP The posthumous inventory of the French king Louis XIV’s possessions describes the table in considerable detail. Although the inventory gives neither the name of the maker nor its original location … COLI Saarbruecken
Combining salience and lexical knowledge: a possible strategy • A search algorithm inspired by Sidner (1979): • Search anchors in the previous sentence, starting with the first-mentioned entity, then considering the other CFs in rank order, but limiting search to past CBs or CPs • If no ‘lexically plausible’ anchor found, attempt CBs and CPs of previous sentences, beginning with the closest and moving further away. • Choice strategy: the CF with the stronger lexical relation to the bridging description • Perhaps only considering `plausible’ conceptual relations as suggested by Markert et al (1996) COLI Saarbruecken
Massimo Poesio: WordNet continuously improved! Problem • The 110 BDs in the corpus enter into 97 mereological relations (ELEMENT, SUBSET, or POSS) • As a simple experiment, we ran a script trying to find a direct link in WordNet (1.7) between one of the senses of the BD and one of the senses of any of the previous CFs (no segmentation) • Relations considered: PART_MERONYM, MEMBER_MERONYM, SUBSTANCE_MERONYM, PART_HOLONYM, MEMBER_HOLONYM, SUBSTANCE_HOLONYM • In only 6 cases there is a direct lexical relation between a BD and one of the CFs (and it’s never a plausible anchor). COLI Saarbruecken
Improve the search strategy? • Already Poesio, Vieira & Teufel (1997) found that using more advanced search strategies inspired by ‘spreading activation’ did not greatly improve recall (and significantly degraded precision / efficiency) • Cases in which we would expect the necessary information to be in WordNet: • the table … the drawer • wn drawer –holon/-hholn: chest of drawers, buffet, chiffonier, desk • wn table –meron / -hmern: sense 1: row, column; sense 2: leg, tabletop • the house … the furniture • Cases in which it is unlikely that WordNet is going to ever contain what’s needed • this work … the bronze medallion above the central door COLI Saarbruecken
An alternative approach • These (admittedly, very preliminary) results confirm our previous impression that in the long run, the only way forward is to develop mechanisms for acquiring the necessary knowledge (lexical and otherwise) • The continuous trouble our annotators have in finding all BDs seem to suggest that perhaps not all are resolved all the time • It’s also not clear whether it is feasible to attempt identifying the exact relation between a BD and its anchor in each case • In some cases, the best we can hope for is to recognize that two objects are related in some sense • These observations all mean trouble for the Sidner-inspired strategy of always trying to find an anchor, & for the strategy of comparing lexical information to do so COLI Saarbruecken
Combining knowledge sources: some preliminary ideas • Choose the anchor that maximizes a combination of salience and strength of the relation: • Eg., using lexical proximity measures: • But only return with a result if the value exceeds a certain threshold COLI Saarbruecken
Conclusions • Three problems for developing a computational theory of BD resolution: • which BDs are in fact resolved, and how much agreement there is on their interpretation? • What are the relative roles of salience and lexical or more general information? • How can we find the info we ned? • Results using WordNet or uniform (and very basic) lexical acquisition method not very good, but results may be improved by • adopting distinct acquisition methods for different types of knowledge • Improving resolution method (e.g., by adopting focus-tracking mechanism) • Results using salience as the only source of information better (at least 68% for associative relations), but we still need adequate lexical resources • Great challenge: finding a way of combining the two types of information (and to decide when we need to do so!!) COLI Saarbruecken
Future Work • Salience: • Adding local focus-tracking methods • Text-tiling • Integration with algorithms for resolving other types of anaphors • Acquisition of lexical and commonsense knowledge for bridges: • Improve results for synonymy (larger corpus) • Hyponymy (e.g., Caraballo’s work) • Combining multiple lexical sources • Acquisition of causal knowledge (e.g., Pazzani’s work) • Premodification: by vector composition (cfr. LSA) • Acquisition of knowledge for other types of descriptions: • Functionality (the deep hold of a great grey ship) • Idioms (What’s the point?) • Larger situation (the sun, the weather, the horizon, …. ) • Improve Evaluation: • On larger amounts of data • by integrating with system (e.g., question-answering,summarization) COLI Saarbruecken
The lexical and commonsense knowledge bottleneck in semantic interpretation • The main roadblock on the path toward robust, high-performance systems for semantic interpretation is the lack of adequate bases of lexical and commonsense information (and of the necessary inferential engines) • Our work on interpreting definite descriptions indicates that this is true for anaphora resolution as well, and that hand-coded resources like WordNet, while useful, are insufficient COLI Saarbruecken
Direct anaphors and discourse-new definite descriptions • DISCOURSE-NEW DEFINITES • the first man on the Moon, the fact that Ginseng tastes of spinach: a list of the most common functional predicates (fact, result, belief) and modifiers (first, last, only… ) • heuristics based on structural information (e.g., establishing relative clauses) • DIRECT ANAPHORA: • the red car, the car, the blue car:premodification heuristics • segmentation: approximated with ‘loose’ windows COLI Saarbruecken