590 likes | 762 Views
Towards operationalizable models of discourse phenomena. Christian Chiarcos chiarcos@uni-potsdam.de. Discourse Phenomena. What does it mean , and why should we * care ? * NLP folks Theory-based approaches Segmented Discourse Representation Theory (SDRT) Annotation-based approaches
E N D
Towards operationalizable models of discourse phenomena Christian Chiarcos chiarcos@uni-potsdam.de
Discourse Phenomena • Whatdoesitmean, and whyshouldwe* care ? * NLP folks • Theory-basedapproaches • SegmentedDiscourseRepresentationTheory (SDRT) • Annotation-basedapproaches • Research at U Potsdam, Germany • A data-basedapproach • Towardstheunsupervisedacquisition of discourse relations
Discourse Phenomena • Discourse • a series of communicative acts exchanged between individuals, conducted with the goal to manipulate the interlocutors‘ state of mind • exchange information, establish social roles, etc. • Discourse phenomena (here) • pertaining to the relation between text and its cognitive representation • ignore socio-linguistic and literary aspects
Phenomena: Discourse Relations • Semantic, logicalorotherdependenciesbetweenutterances in a discourse. • Indicatedbycuewords, maybe optional. John fell. Max pushedhim. John fell, … … because … … after … … and, then, … … Max pushedhim.
Phenomena: Discourse Structure (1) PeterP came home. (2) HeP had a long conversation with JohnJ. (3) To have someone to talk to helped himP/J a lot. (1) PeterP came home. (2) HeP had a long conversation with JohnJ. (2a) MarryM arrived later in the evening. (3) To have someone to talk to helped himP/??J a lot.
Phenomena: Discourse Structure (1) (1) PeterP came home. (2) HeP had a long conversation with JohnJ. (3) To have someone to talk to helped himP/J a lot. (2) (3) (1) (1) PeterP came home. (2) HeP had a long conversation with JohnJ. (2a) MarryM arrived later in the evening. (3) To have someone to talk to helped himP/??J a lot. (2) (2a) (3)
… why should we care ? • Anaphora Resolution and discourse relations • John pushed Max. He fell. John pushed Max. So, he fell. [cause] • John pushed Max. He apologized. John pushed Max. But he apologized. [contrast*] * contrast with the implicit assumption that John pushed Max by intention.
… why should we care ? • Text summarization and discourse structure • The more deeply embedded a discourse segment is, the less likely is it to be included in the summary. (Marcu 1997)
Theory-based Approaches • Example: SDRT • SegmentedDiscourseRepresentationTheory (Asher 1993, Asher & Lascarides 2003) • dynamicsemantics (DiscourseRepresentationTheory, Kamp 1982) • extendedwithdiscourse relations Hobbs (1978), Mann & Thompson (1987) • hierarchicaldiscoursestructure Polanyi (1985), Webber (1988)
x y e1 n Max(x) John(y) e1: push(x,y) e1 < n Discourse Analysis with SDRT Max pushed John. 1 Discourse segment (utterance) x variable (discourse referent) for Max y variable (discourse referent) for John e1 variable (event) described by utterance n reference time (present) 1: unary predicates that represent noun attributes binary predicate that reflects the semantics of the verb parse and create segment the event precedes the present time
x y e1 n Max(x) John(y) e1: push(x,y) e1 < n Discourse Analysis with SDRT Max pushed John. 1 1: integrate segment with the (previously empty) context
1 1: x y e1 n Max(x) John(y) e1: push(x,y) e1 < n Discourse Analysis with SDRT Max pushed John. He fell. 2: z e2 n e2: fall(z) e2 < n process next utterance construct new segment
x y e1 n Max(x) John(y) e1: push(x,y) e1 < n Discourse Analysis with SDRT Max pushed John. He fell. 1,2 1: 2: z e2 n e2: fall(z) e2 < n z = y Result(1, 2) Narration(1, 2) update with the new segment anaphor resolution inferred discourse relations
Discourse Analysis with SDRT • SDRT accounts for • anaphoric reference • lexical disambiguation • bridging • presupposition • ellipsis • coherence • but only, if discourse relations can be inferred
Inference of Discourse Relations SDRT defeasible (nonmonotonic) inference (Glue logic) semantic constraints on the new segment structural constraints on potential attachment points semantic constraints on potential attachment point > discourse relation to be applied > defeasible inference, monotone inference (e.g., if a discourse connector signals the relation unambiguously)
Inference of Discourse Relations if segment can be attached to segment and the event described in is a pushing event with arguments x and y and the event described in involves a falling event of argument y then, normally, the discourse relation between and is a Result (<t,,> Ù [Push(ea,x,y)]KaÙ[Fall(e,y)]K) > Result(,)
Inference of Discourse Relations if segment can be attached to segment and the event described in is a pushing event with arguments x and y and the event described in involves a falling event of argument y then, normally, the discourse relation between and is a Result (<t,,> Ù [Push(ea,x,y)]KaÙ[Fall(e,y)]K) > Result(,)
Inference of Discourse Relations if segment can be attached to segment and the event described in is a pushing event with arguments x and y and the event described in is a falling event of argument y then, normally, the discourse relation between and is a Result (<t,,> Ù [Push(ea,x,y)]KaÙ[Fall(e,y)]K)> Result(,)
Inference of Discourse Relations if segment can be attached to segment and the event described in is a pushing event with arguments x and y and the event described in is a falling event of argument y then, normally, the discourse relation between and is a Result (<t,,> Ù [Push(ea,x,y)]KaÙ[Fall(e,y)]K) > Result(,)
Inference of Discourse Relations to operationalize SDRT as it is stated, we need an exhaustive formal model of shared knowledge and formally defined rules to infer every possible discourse relation, etc. • „GLUE logic“ • accesses • structural and propositional contents of the context • propositional contents of the new segment • employs • generic pragmatic principles (e.g., Gricean) • specific pragmatic principles (e.g., shared world knowledge) • monotonic axioms (gather discourse clues from logical form) • defeasible (non-monotonic) rules (infer discourse relations)
Inference of Discourse Relations to operationalize SDRT as it is stated, we need an exhaustive formal model of shared knowledge and formally defined rules to infer every possible discourse relation • „GLUE logic“ • accesses • structural and propositional contents of the context • propositional contents of the new segment • employs • generic pragmatic principles (e.g., Gricean) • specific pragmatic principles (e.g., shared world knowledge) • monotonic axioms (gather discourse clues from logical form) • defeasible (non-monotonic) rules (infer discourse relations) In this form, these resources are not available. State of the art: Underspecified discourse analysis Discourse relations only for explicit cues Approximate shared knowledge with lexical-semantic resources (FrameNet, etc.) (Bos 2008)
Annotation-based Approaches • Theory-based approach presupposes knowledge sources that we do not have • Alternative • Creation of discourse-annotated corpora • Machine learning techniques to predict discourse annotations from „lower“ annotation layers • Combine rule-based and ML components
English Corpora • discourse relations (Penn Discourse Treebank, RST Discourse Treebank) • discourse structure (RST Discourse Treebank, Groningen Meaning Bank) • temporal relations (TimeBank) • information structure (NXT Switchboard Corpus) • coreference (MUC, OntoNotes) • etc.
Discourse at Potsdam • University of Potsdam, Germany • Applied Computational Linguistics Lab (Manfred Stede) • discourse structure and its applications in NLP • Rhetorical Structure Theory (Mann & Thompson 1987) • Potsdam Commentary Corpus* • small corpus of German newspaper text (44,000 tokens) • coreference, discourse structure, information structure, connectives, illocutions, etc. • test bed for infrastructures for discourse annotations * http://www.ling.uni-potsdam.de/pcc/pcc.html
Discourse at Potsdam • Collaborative Research Center 632 „Information Structure“ (2003-2015) • University of Potsdam, Humboldt-University Berlin, Free University Berlin, Germany • network of about 15 projects, currently 60 people • Vision • unified terminological framework for the interface between discourse and syntax • Selected activities • theoretical research, discourse annotation, infrastructure
Discourse-Annotated Corpora • Questionnaire for Information Structure data • 13 typologically diverse languages • Information structure (Skopeteas et al. 2006) • African corpora (Chiarcos et al. 2011) • Corpora of 20 subsaharic languages • Hausa Internet Corpus, Wolof Bible corpus • Corpora of non-standard varieties • Old High German (Petrova et al. 2009) • German sociolects (Wiese 2011) • German corpora • OVS corpus, Kiel Radio News corpus, HPSG parse bank
Annotation Challenges Cognitive (and financial) efforts Discourseannotationsusuallyrequire a deepunderstanding and comprehension of the text Agreement Annotatorswith different backgroundknowledgearelikely to disagree in theirinterpretations Ambiguity Manydiscoursecategoriesareinherentlyimprecise
Technological Challenges • Discourse annotations are extremely heterogeneous • Relational (coreference, discourse relations) • Hierarchical (discourse structure) • Different specialized annotation tools • General architecture for • Corpus und NLP interoperability • Multi-layer corpora (Chiarcos et al., 2008)
ANNIS Data Base Generic database and query interface for multi-layer corpora (Chiarcos et al. 2008, Zeldes et al. 2009) http://www.sfb632.uni-potsdam.de/d1/annis
Working with Discourse-Annotated Corpora • Corpus studies and machine-learning experiments @ Potsdam: • information structure vs. morphosyntax (Dipper et al. 2007, Ritz 2011) • coreference vs. discourse structure (Chiarcos & Krasavina 2008) • coreference vs. information structure vs. syntax (Chiarcos 2010, 2011)
Data Challenges Data sparsity High annotationcosts and limitedreliabilityrestricttheamount and coverage of existingannotations Limitedagreement Ifyourclassifierperformsbetterthantheannotators, agreementmetricsareuninterpretable. Limiteddataoverlap Dependenciesbetweendiscoursephenomenacanonlybestudiedifthesameprimarydataisused
Data Challenges Data sparsity High annotationcosts and limitedreliabilityrestricttheamount and coverage of existingannotations Limitedagreement Ifyourclassifierperformsbetterthantheannotators, agreementmetricsareuninterpretable. Limiteddataoverlap Dependenciesbetweendiscoursephenomenacanonlybestudiedifthesameprimarydataisused More and better data may be available if information could be preprocessed to a larger extent
A Data-based Approach • Idea Employcorporawithoutdiscourseannotation (a) to evaluatemodels and theories of discourse, or (e.g., Chiarcos 2011) (b) to extrapolatebackgroundinformationthatmaybeapplied in theory-basedapproachesor to supportmanualannotation. (Chiarcosaccepted)
A Data-based Approach • Idea Employcorporawithoutdiscourseannotation (a) to evaluatemodels and theories of discourse, or (e.g., Chiarcos 2011) (b) to extrapolatebackgroundinformationthatmaybeapplied in theory-basedapproachesor to supportmanualannotation. (Chiarcosaccepted)
Inferring Discourse Relations in SDRT if segment can be attached to segment and the event described in is a pushing event with arguments x and y and the event described in is a falling event of argument y then, normally, the discourse relation between and is a Result (<t,,> Ù [Push(ea,x,y)]KaÙ[Fall(e,y)]K) > Result(,)
Inferring Discourse Relations in SDRT • rules tailored towards specific event types • not provided by any lexical-semantic resource I am aware of • hard to construct manually • distributional hypothesis • cues for the „normal“ discourse relation for a pair of events should occur more frequently than other • so, let‘s just count them …
Data Structures • event pair <event1, event2> • triple<event1, relationword, event2> • event1: eventtype of theexternalargument • event2: eventtype of theinternalargument • relationword: 0 or a discoursemarker • e.g., <push, fall>, <push, then, fall>
Events • heuristic: event = lemma of main verb • auxiliaries, modal verbs, etc. are stripped • it would be interesting to develop more • heuristic: event1 = event of preceding sentence • external argument is more likely to be the main event of the preceding utterance than anything else • more remote antecedent candidates are subject to structural constraints
Relation Words • adverbs, conjunctions, phrases, relative clauses, etc. • purely syntactic definition • to avoid preemptive restriction to limited set of relation words • relation word is the string representation of a sentence-initial adverbial argument of the main event in the new segment, a sentence-initial conjunction, or (if neither found) 0
Weighing the Evidence • Noisy data • external argument heuristically determined • Coarse-grained approximation of events • relevant level of detail of event description may not be covered => Rigid, theoretically well-founded pruning • significance tests • χ² where applicable, t-test otherwise
Significance Tests • Given a relation word R and an event pair <x,y> • How probable is it that the relative frequency of R under the condition <x,y> deviates by chance from the unconditioned relative frequency of R ?
Significance Tests • Given a relation word R and an event pair <x,y> • How probable is it that the relative frequency of R under the condition <x,y> deviates by chance from the unconditioned relative frequency of R ? • If this probability is below 5%, remove the triple. • Remaining triples are highly significant (p<.05).
Correlation • Given a relation word R and an event pair <x,y> • Assume that the distribution of R for <x,y> differs significantly from the distribution of R in general. • P(R|<x,y>) > P(R): positive correlation • P(R|<x,y>) < P(R): negative correlation
Data • Huge corpora needed • adjacent sentences only • with some 1000 frequent verbs in a language, every event pair has a probability of 1:10^6 • relation words are optional and manifold, need several instantiations to establish significance => several million sentences needed • Syntax-defined relation words => syntax-annotated corpora
Wacky corpora (Baroni et al. 2009) • PukWaC • 2G-token dump of the uk domain • tagged and lemmatized with TreeTagger (Schmidt 1996) • parsed with MaltParser (Nivre et al. 2006) • Stanford dependencies • Wackypedia • English Wikipedia (2009), 0.8G-token • same annotations • Consider 80% of both corpora • PukWaC: 72.5M sentences • Wackypedia: 33.2M sentences
Evaluation • Goal • Test whether potentially usable results can be obtained with this methodology • despite the simplifications • Evaluation of the methodology, as preparation for subsequent experiments
Evaluation Criteria • Significance • Are there significant correlations between event pairs and relation words ? • Reproducibility • Can these correlations be confirmed on independent data sets ? • Interpretability • Can these correlations be interpreted in terms of theoretically motivated discourse relations ?
Significance • Significance test incorporated in the pruning step of the algorithm
Reproducibility • consider PukWaC subcorpora of different size • identify common triples also found in Wackypedia • agreeing portion of common triples: the same (positive or negative) correlation in both corpora
Interpretability • Theory- and annotation-independent test • relationwordswithsimilarfunctionshouldbedistributionallysimilar • unrelatedrelationwordsshouldbedistributionallylesssimilar • Expectation • butismorelikehowever [contrastive] butvery different fromthen[temporal/causal]