Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources

mitre Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources Ben Wellner†*, James Pustejovsky†, Catherine Havasi†, Anna Rumshisky† and Roser Saurí† † Brandeis University * The MITRE Corporation

Outline of Talk • Overview and Motivation for Modeling Discourse • Background • Objectives • The Discourse GraphBank • Overview • Coherence Relations • Issues with the GraphBank • Modeling Discourse • Machine learning approach • Knowledge Sources and Features • Experiments and Analysis • Conclusions and Future Work

Modeling Discourse: Motivation • Why model discourse? • Dialogue • General text understanding applications • Text summarization and generation • Information extraction • MUC Scenario Template Task • Discourse is vital for understanding how events are related • Modeling discourse generally may aid specific extraction tasks

Background • Different approaches to discourse • Semantics/formalisms: Hobbs [1985], Mann and Thomson[1987], Grosz and Sidner[1986], Asher [1993], others • Different objectives • Informational vs. intentional, dialog vs. general text • Different inventories of discourse relations • Coarse vs. fine-grained • Different representations • Tree representation vs. Graph • Same steps involved: • 1. Identifying discourse segments • 2. Grouping discourse segments into sequences • 3. Identifying the presence of a relation • 4. Identifying the type of the relation

Discourse Steps #1* Mary is in a bad mood because Fred played tuba while she was taking a nap. 1. Segment: A B C r1 2. Group r2 A 3. Connect segments B C r1 = cause-effect r2 = elaboration 4. Relation Type: * Example from [Danlos 2004]

Discourse Steps #2* 1. Segment: Fred played the tuba. Next he prepared a pizza to please Mary. A B C r1 r2 2. Group 3. Connect segments A B C r1 = temporal precedence r2 = cause-effect 4. Relation Type: * Example from [Danlos 2004]

Objectives • Our Main Focus: Step 4 - classifying discourse relations • Important for all approaches to discourse • Can be approached independently of representation • But – relation types and structure are probably quite dependent • Task will vary with inventory of relation types • What types of knowledge/features are important for this task • Can we apply the same approach to Step 3: • Identifying whether two segment groups are linked

Discourse GraphBank: Overview • [Wolf and Gibson, 2005] • Graph-based representation of discourse • Tree-representation inadequate: multiple parents, crossing dependencies • Discourse composed of clausal segments • Segments can be grouped into sequences • Relations need not exist between segments within a group • Coherence relations between segment groups • Roughly those of Hobbs [1985] • Why GraphBank? • Similar inventory of relations as SDRT • Linked to lexical representations • Semantics well-developed • Includes non-local discourse links • Existing annotated corpus, unexplored outside of [Wolf and Gibson, 2005]

Resemblance Relations The first flight to Frankfurt this morning was delayed. The second flight arrived late as well. Similarity: (parallel) The first flight to Frankfurt this morning was delayed. The second flight arrived on time. Contrast: There have been many previous missions to Mars. A famous example is the Pathfinder mission. Example: Generalization: Two missions to Mars in 1999 failed. There are many missions to Mars that have failed. A probe to Mars was launched from the Ukraine this week. The European-built “Mars Express” is scheduled to reach Mars by Dec. Elaboration*: • * The elaboration relation is given one or more sub-types: • organization, person, location, time, number, detail

Causal, Temporal and Attribution Relations Cause-effect: There was bad weather at the airport and so our flight got delayed Causal If the new software works, everyone should be happy. Conditional: The new software worked great, but nobody was happy. Violated Expectation: First, John went grocery shopping. Then, he disappeared into a liquor store. Temporal Precedence: John said that the weather would be nice tomorrow. Attribution Attribution: The economy, according to analysts, is expected to improve by early next year. Same:

Some Issues with GraphBank • Coherence relations • Conflation of actual causation and intention/purpose • Granularity • Desirable for relations hold between eventualities or entities, not necessarily entire clausal segments: The university spent $30,000 to upgrade lab equipment in 1987 cause ?? John pushed the door to open it. cause elaboration the new policy came about after President Reagan’s historic decision in mid-December to reverse the policy of refusing to deal with members of the organization, long shunned as a band of terrorists. Reagan said PLO chairman Yasser Arafat had met US demands.

A Classifier-based Approach • For each pair of discourse segments, classify relation type between them • For segment pairs on which we know a relation exists • Advantages • Include arbitrary knowledge sources as features • Easier than implementing inference on top of semantic interpretations • Robust performance • Gain insight into how different knowledge sources contribute • Disadvantages • Difficult to determine why mistakes happen • Maximum Entropy • Commonly used discriminative classifier • Allows for a high-number of non-independent features

Knowledge Sources • Knowledge Sources: • Proximity • Cue Words • Lexical Similarity • Events • Modality and Subordinating Relations • Grammatical Relations • Temporal relations • Associate with each knowledge source • One or more Feature Classes

Example SEG2: The university spent $30000 SEG1: to upgrade lab equipment in 1987

Proximity • Motivation • Some relations tend to be local – i.e. Their arguments appear nearby in the text • Attribution, cause-effect, temporal precedence, violated expectation • Other relations can span larger portions of text • Elaboration • Similar, contrast Feature Class Proximity: - Whether segments are adjacent or not - Directionality (which argument appears earlier in the text) - Number of intervening segments

Cue Words • Motivation: • Many coherence relations are frequently identified by a discourse cue word or phrase: “therefore”, “but”, “in contrast” • Cues are generally captured by the first word in a segment • Obviates enumerating all potential cue words • Non-traditional discourse markers (e.g. adverbials or even determiners) may indicate a preference for certain relation types Feature Class Cue Words: - First word in each segment

Lexical Coherence • Motivation: • Identify lexical associations, lexical/semantic similarities • E.g. push/fall, crash/injure, lab/university • Brandeis Semantic Ontology (BSO) • Taxonomy of types (i.e. senses) • Includes qualia information for words • Telic (purpose), agentive (creation), constitutive (parts) • Word Sketch Engine (WSE) • Similarity of words as measured by their contexts in a corpus (BNC) Feature Class BSO: - Paths between words up to length 10 WSE: - Number of word pairs with similarity > 0.05, > 0.01 - Segment similarities (sum of word-pair similarities / # words)

Events • Motivation: • Certain events and event-pairs are indicative of certain relation types (e.g. “push”-”fall”: cause) • Allow learner to associate events and event-pairs with particular relation types • Evita: EVents In Text Analyzer • Performs domain independent identification of events • Identifies all event-referring expressions (that can be temporally ordered) Feature Class Events: - Event mentions in each segment - Event mention pairs drawn from both segments

Modality and Subordinating Relations • Motivation: • Event modality and subordinating relations are indicative of certain relations • SlinkET [Saurí et al. 2006] • Identifies subordinating contexts and classifying as: • Factive, counter-factive, evidential, negative evidential, or modal • E.g. evidential => attribute relation • Event class, polarity, tense, etc. Feature Class SlinkET: - Event class, polarity, tense and modality of events in each segment - Subordinating relations between event pairs

Cue Words and Events • Motivation • Certain events (event types) are likely to appear in particular discourse contexts keyed by certain connectives. • Pairing connectives with events captures this more precisely than connectives or events on their own Feature Class CueWords + Events: - First word of SEG1 and each event mention in SEG2 - First word of SEG2 and each event mention in SEG1

Grammatical Relations • Motivation: • Certain intra-sentential relations captured or ruled out by particular dependency relations between clausal headwords • Identification of headwords also important • Main events identified • RASP parser Syntax: - Grammatical relations between two segments - GR + SEG1 head word - GR + SEG2 head word - GR + Both head words Feature Class

Temporal Relations • Motivation: • Temporal ordering between events constrains possible coherence relations • E.g. E1 BEFORE E2 => NOT(E2 CAUSE E1) • Temporal Relation Classifier • Trained on TimeBank 1.2 using MaxEnt • See [Mani et al. “Machine Learning of Temporal Relations” ACL 2006] Feature Class TLink: - Temporal Relations holding between segments

Relation Classification • Identify • Specific coherence relation • Ignoring elaboration subtypes (too sparse) • Coarse-grained relation (resemblance, cause-effect, temporal, attributive) • Evaluation Methodology • Used Maximum Entropy classifier ( Gaussian prior variance = 2.0 ) • 8-fold cross validation • Specific relation accuracy: 81.06% • Inter-annotator agreement: 94.6% • Majority Class Baseline: 45.7% • Classifying all relations as elaboration • Coarse-grain relation accuracy: 87.51%

F-Measure Results

Results: Confusion Matrix Hypothesis Reference

Feature Class Analysis • What is the utility of each feature class? • Features overlap significantly – highly correlated • How can we estimate utility? • Independently • Start with Proximity feature class (baseline) • Add each feature class separately • Determine improvement over baseline • In combination with other features • Start with all features • Remove each feature class individually • Determine reduction from removal of feature class

Feature Class Analysis Results Feature Class Contributions in Isolation Feature Class Contributions in Conjunction

Relation Identification • Given • Discourse segments (and segment sequences) • Identify • For each pair of segments, whether a relation (any relation) exists on those segments • Two issues: • Highly skewed classification • Many negatives, few positives • Many of the relations are transitive • These aren’t annotated and will be false negative instances

Relation Identification Results • For all pairs of segment sequence in a document • Used same features as for classification • Achieved accuracy only slightly above majority class baseline • For segment pairs in same sentence • Accuracy: 70.04% (baseline 58%) • Identification and classification in same sentence • Accuracy: 64.53% (baseline 58%)

Inter-relation Dependencies • Each relation shouldn’t be identified in isolation • When identifying a relation between si and sj, consider other relations involving si and sj • Include as features the other (gold standard true) relation types both segments are involved in • Adding this feature class improves performance to 82.3% • 6.3% error reduction • Indicates room for improvement with • Collective classification (where outputs influence each other) • Incorporating explicit modeling constraints • Tree-based parsing model • Constrained DAGs [Danlos 2004] • Including, deducing transitive links may help further

Conclusions • Classification approach with many features achieves good performance at classifying coherence relation types • All feature classes helpful, but: • Discriminative power of most individual feature classes captured by union of remaining feature classes • Proximity + CueWords acheives 76.77% • Remaining features reduce error by 23.7% • Classification approach performs less well on task of identifying the presence of a relation • Using same features as for classifying coherence relation types • “Parsing” may prove better for local relationships

Future Work • Additional linguistic analysis • Co-reference – both entities and events • Word-sense • lexical similarity confounded with multiple types for a lexeme • Pipelined or ‘stacked’ architecture • Classify coarse-grained category first, then specific coherence relation • Justification: different categories require different types of knowledge • Relational classification • Model decisions collectively • Include constraints on structure • Investigate transitivity of resemblance relations • Consider other approaches for identification of relations

Questions?

Backup Slides

GraphBank Annotation Statistics • Corpus and Annotator Statistics • 135 doubly annotated newswire articles • Identifying discourse segments had high agreement (> 90% from pilot study of 10 documents) • Corpus segments ultimately annotated once (by both annotators together) • Segment grouping - Kappa 0.8424 • Relation identification and typing - Kappa 0.8355

Factors Involved in Identifying Coherence Relations • Proximity • E.g. Attribution local, elaboration non-local • Lexical and phrasal cues • Constrain possible relation types • But => ‘contrast’, ‘expected violation’ • And => ‘elaboration’, ‘similar’, ‘contrast’ • Co-reference • Coherence established with references to mentioned entities/events • Argument structure • E.g. similar => similar/same event and/or participants • Lexical Knowledge • Type inclusion, word sense • Qualia (purpose of an object, resulting state of an action), event structure • Paraphrases: delay => arrive late • World Knowledge • E.g. Ukraine is part of Europe

Architecture Training Knowledge Source 1 Pre-processing Knowledge Source 2 Feature Constructor Model Classifications Knowledge Source n Prediction

Scenario Extraction: MUC • Pull together relevant facts related to a “complex event” • Management Succession • Mergers and Acquisitions • Natural Disasters • Satellite launches • Requires identifying relations between events: • Parallel, cause-effect, elaboration • Also: identity, part-of • Hypothesis: • Task independent identification of discourse relations will allow rapid development of Scenario Extraction systems

Information Extraction: Current Scenario Extraction Fact Extraction Task 1.1 Domain 1 Task 1.N Pre-process Task 2.1 Domain 2 Task 2.N Domain N

Information Extraction: Future Pre-process Fact Extraction Discourse

Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources