990 likes | 1.01k Views
This workshop will cover various topics related to discourse processing, including coherence, discourse theory, RST, discourse annotation and corpus, and more. Participants will gain a deeper understanding of what makes something a discourse and how to structure text for coherence.
E N D
Workshop on NLP - IIITH July 2014 Discourse Processing RanjaniParthasarathi Professor, Dept. Of Info. Science & Technology Anna University, CEG campus
Contents • Discourse – An intro • Coherence • Discourse theory • RST • Discourse annotation & corpus • Sangati Workshop on NLP - IIITH July 2014
Discourse • Consists of collocated, structured, coherent groups of sentences • What makes something a discourse as opposed to a set of unrelated sentences? • How can text be structured (related)? • Coherence – central theme - models the logical flow of the discourse Workshop on NLP - IIITH July 2014
Intrinsic features of discourse Position, order, adjacency and context • Position: opening sentence , Ending sentence. • Order: different orders lead to various events/meaning • I said the magic words, and a genie appeared. vs. • A genie appeared, and I said the magic words. • Adjacency: attributed material and contrasts are visible through sentences nearby • Context : intended meaning can only be conveyed when understood in context. Workshop on NLP - IIITH July 2014
Coherence • Coherence as the main characteristic of discourse • How do you recognize discourse? • It makes sense! • It is relevant! • It ‘hangs together’ • "It is coherent!! ! Workshop on NLP - IIITH July 2014
Coherence • Coherence is a property humans use to evaluate text quality. • A coherent discourse must have meaningful connections (i.e. coherence relations) between its utterances. Ram got caught in the rain. He fell ill. Result Workshop on NLP - IIITH July 2014
Discourse Coherence Reference Relations Discourse Relations Informational Intentional Coherence • The meaning and coherence of a discourse results partly from how its constituents relate to each other. • Reference relations – coreference or Anaphora Resolution - determining which entity a referring expression refers to • Discourse relations Workshop on NLP - IIITH July 2014
Discourse relations • Informational(or semantic) discourse relations convey relations that hold in the subject matter (betweenabstract entities of appropriate sorts (e.g., facts, beliefs, eventualities, etc.), e.g, CONTRAST, CAUSE, CONDITIONAL, TEMPORAL, etc. • Intentional discourse relations specify how intended discourse effects relate to each other. • Informationalperspective – to find with in an utterance why it would be true or to understand the situation from the individual components of the utterance • Intentional perspective - to find with in an utterance why it was said. • Eg. Where are the oranges with reduced price ? How many kilos did you want ? [Moore & Pollack, 1992] argue that discourse analysis requires both types. Workshop on NLP - IIITH July 2014
Where are Discourse Relations declared? Two types of triggers for discourse relations considered by researchers: Structure • Discourse relations hold primarily between adjacent components with respect to some notion of structure. Lexical Elements and Structure • Lexical elements can relate the Abstract Object interpretations of non-adjacent as well as adjacent components. • Discourse relations can be triggered by structure underlying adjacency, i.e., between adjacent components unrelated by lexical elements. Workshop on NLP - IIITH July 2014
Triggering Discourse Relations Lexical Elements • Cohesion in Discourse (Halliday & Hasan) Structure • Rhetorical Structure Theory (Mann & Thompson) • Linguistic Discourse Model (Polanyi and colleagues) • Discourse GraphBank (Wolf & Gibson) Lexical Elements and Structure • Discourse Lexicalized TAG (Webber, Joshi, Stone, Knott) Different triggers encourage different annotation schemes. Workshop on NLP - IIITH July 2014
Discourse Theories • Rhetorical Structure Theory (Mann and Thompson, 1988) • Finds coherence between clauses, sentences and paragraphs within a document. • Discourse Representation Theory (Kamp 1981) • Focusses on reference resolution • Segmented Discourse Representation Theory (Asher 1993) • RST+ DRT • Cross Document Structure Theory (Radev 2000) • Finds coherent relations between documents Workshop on NLP - IIITH July 2014
Basic research questions • What is the nature of discourse relations? • Conceptual relations between abstract objects (RST) • Lexically grounded relations? (PDTB) • What is the inventory of discourse relations? • What is the appropriate data structure for discourse relations ? • Trees (RST) • Graphs • Dependencies (+ structure PDTB) Workshop on NLP - IIITH July 2014
Rhetorical Structure Theory • A theory of coherence relations, in which coherence is referred to as rhetorical relation. • Originally proposed for the study of text generation (Mann and Thompson, 1988), recent overview available by Taboada and Mann (2006) Workshop on NLP - IIITH July 2014
Principles • Coherent texts consist of minimal units, which are linked to each other, recursively, through rhetorical relations • Coherent texts do not show gaps or non-sequiturs • Therefore, there must be some relation holding among the different parts of the text Workshop on NLP - IIITH July 2014
Components • Units of discourse • Texts can be segmented into minimal units, or spans • Nuclearity • Some spans are more central to the text’s purpose (nuclei), whereas others are secondary (satellites) • Based on hypotactic and paratactic relations in language • Relations among spans • Spans are joined into discourse relations • Hierarchy/recursion • Spans that are in a discourse relation may enter into new relations Workshop on NLP - IIITH July 2014
Components ... • Nucleus vs. satellite • Nucleus: more central to the text’s purpose (more salient to the discourse structure) and interpretable independently • satellite: less central, represents supporting info. thus generally is only interpretable w.r.t. the nucleus Workshop on NLP - IIITH July 2014
he does not sing well John won the contest Antithesis Nucleus Satellite An Example • John won the contest but he does not sing well. Workshop on NLP - IIITH July 2014
Relations • They hold between two non-overlapping text spans • Most of the relations hold between a nucleus and a satellite, although there are also multi-nuclear relations • A relation consists of: 1. Constraints on the Nucleus, 2. Constraints on the Satellite, 3. Constraints on the combination of Nucleus and Satellite, 4. The Effect. Workshop on NLP - IIITH July 2014
Example: Evidence • Constraints on the Nucleus • The reader may not believe N to a degree satisfactory to the writer • Constraints on the Satellite • The reader believes S or will find it credible • Constraints on the combination of N+S • The reader’s comprehending S increases their belief of N • Effect (the intention of the writer) • The reader’s belief of N is increased • Definitions of most common relations are available from the RST web site (www.sfu.ca/rst) Workshop on NLP - IIITH July 2014
Paratactic (coordinate) • At the sub-sentential level (traditional coordinated clauses) • Peel oranges, and slice crosswise. • But also across sentences • 1. Peel oranges, 2. and slice crosswise. 3. Arrange in a bowl 4. and sprinkle with rum and coconut. 5. Chill until ready to serve. Workshop on NLP - IIITH July 2014
Hypotactic (subordinate) • Sub-sentential Concession relation • Concession across sentences • Nucleus (spans 2-3) made up of two spans in an Antithesis relation Workshop on NLP - IIITH July 2014
Relation types • Relations are of different types • Subject matter: they relate the content of the text spans - explain the parts of the subject or the core theme of the text. • Elaboration, Evaluation, Interpretation, Means, Cause, Result, Otherwise, Purpose, Solutionhood, Condition, Unconditional and Unless • Presentational: more rhetorical in nature. They are meant to achieve some effect on the reader. Facilitatethe presentation aspects with which the author writes the text. • Antithesis, Background, Concession, Enablement, Evidence, Justify, Motivation, Preparation, Restatement and Summary • Multi-nuclear : Most of the relations hold between a nucleus and a satellite, but there are also multi-nuclear relations • Conjunction, Disjunction, Contrast, Joint, List, MultiNuclear Restatement and Sequence Workshop on NLP - IIITH July 2014
Condition If you come to my school tomorrow You can meet my teacher Example for Subject Matter Relation If you come to my school tomorrow, you can meet my teacher Satellite Nucleus Workshop on NLP - IIITH July 2014
Antithesis but I don’t like reading very short stories. Reading is always interesting Example for Presentational Relation • Reading is always interesting but I don’t like reading very short stories. Satellite Nucleus Workshop on NLP - IIITH July 2014
Joint A good human too John is a good athlete Example for Multi Nuclear Relation • John is a good athlete and a good human too. Nucleus Nucleus Workshop on NLP - IIITH July 2014
Relation names (in M&T 1988) Other classifications are possible, and longer and shorter lists have been proposed Workshop on NLP - IIITH July 2014
List of Rhetorical Relations (RST website) • Circumstance • Condition • Elaboration • Evaluation • Interpretation • Means • Non-volitional / volitional Cause • Non-volitional / volitional Result • Otherwise • Purpose • Solutionhood • Unconditional • Unless • Antithesis • Background • Concession • Enablement • Evidence • Justify • Motivation • Preparation • Restatement • Summary • Conjunction • Disjunction • Contrast • Joint • List • Sequence • Restatement Workshop on NLP - IIITH July 2014
Other possible classifications • Relations that hold outside the text • Condition, Cause, Result vs. those that are only internal to the text • Summary, Elaboration • Relations frequently marked by a discourse marker • Concession (although, however); Condition (if, in case) vs. relations that are rarely, or never, marked • Background, Restatement, Interpretation • Preferred order of spans: nucleus before satellite • Elaboration – usually first the nucleus (material being elaborated on) and then satellite (extra information) vs. satellite-nucleus • Concession – usually the satellite (the although-type clause or span) before the nucleus Workshop on NLP - IIITH July 2014
Schemas They specify how spans of text can co-occur, determining possible RST text structures Workshop on NLP - IIITH July 2014
Graphical representation • A horizontal line covers a span of text (possibly made up of further spans • A vertical line signals the nucleus or nuclei • A curve represents a relation, and the direction of the arrow, the direction of satellite towards nucleus Workshop on NLP - IIITH July 2014
RST Example (1) George Bush supports big business. (2) He’s sure to veto House Bill 1711. (3) Otherwise, big business won’t support him. Workshop on NLP - IIITH July 2014
How to do an RST analysis • Divide the text into units (Segmentation) Unit size may vary, depending on the goals of the analysis Clauses/Sentences- Elementary Discourse Unit (EDU) Paragraph/document- Complex Discourse Unit (CDU) • Examine each unit, and its neighbours. Is there a clear relation holding between them? • If yes, then mark that relation (e.g., Condition) • If not, the unit might be at the boundary of a higher-level relation. Look at relations holding between larger units (spans) Workshop on NLP - IIITH July 2014
How to do an RST analysis • 5.Continue until all the units in the text are accounted for • 6.Remember, marking a relation involves satisfying all 4 fields (especially the Effect). Workshop on NLP - IIITH July 2014
Examples Workshop on NLP - IIITH July 2014
Subject Matter Relations Circumstance Constraints on either S or N individually: on S: S is not unrealized Constraints on N + S: S sets a framework in the subject matter within which R is intended to interpret N Intention of W: R recognizes that S provides the framework for interpreting N Example: While I was walking on the road, I saw an accident. Nucleus: I saw an accident. Satellite: While I was walking on the road Workshop on NLP - IIITH July 2014
Subject Matter RelationsCondition Constraints on either S or N individually: on S: S presents a hypothetical, future, or otherwise unrealized situation (relative to the situational context of S) Constraints on N + S: Realization of N depends on realization of S Intention of W: R recognizes how the realization of N depends on the realization of S Example: 1. Employees are urged to complete new beneficiary designation forms for retirement or life insurance benefits 2. whenever there is a change in marital or family status. Workshop on NLP - IIITH July 2014
Condition contd… Nucleus: Employees are urged to complete new beneficiary designation forms for retirement or life insurance benefits Satellite: Whenever there is a change in marital or family status. Workshop on NLP - IIITH July 2014
Subject Matter Relations Elaboration • Constraints on either S or N individually: • None • Constraints on N + S: • S presents additional detail about the situation or some element of subject matter which is presented in N or inferentially accessible in N in one or more of the ways listed below. In the list, if N presents the first member of any pair, then S includes the second: • set :: member • abstraction :: instance • whole :: part • process :: step • object :: attribute • generalization :: specific Workshop on NLP - IIITH July 2014
Elaboration contd… Intention of W: R recognizes S as providing additional detail for N. R identifies the element of subject matter for which detail is provided. Example: 1. Fruits are good for health 2. I like apples, oranges and banana. Nucleus: Fruits are good for health Satellite: I like apples, oranges and banana. Workshop on NLP - IIITH July 2014
Subject Matter Relations Evaluation Constraints on either S or N individually: none Constraints on N + S: on N + S: S relates N to degree of W's positive regard toward N. Intention of W: R recognizes that S assesses N and recognizes the value it assigns Example: 1. The apartment has many facilities like, covered car parking, rain water harvesting, water recycling plant 2. It aids in increasing the value of the property. Nucleus: The apartment has many facilities like, covered car parking, rain water harvesting, water recycling plant Satellite :2. It aids in increasing the value of the property. Workshop on NLP - IIITH July 2014
Subject Matter Relations Interpretation Constraints on either S or N individually: none Constraints on N + S: on N + S: S relates N to a framework of ideas not involved in N itself and not concerned with W's positive regard. Intention of W: R recognizes that S relates N to a framework of ideas not involved in the knowledge presented in N itself Example: Heavy rain has ruined the paddy fields. Such heavy rain is unusual during the month of May. Workshop on NLP - IIITH July 2014
Interpretation Contd.. Nucleus: Heavy rain has ruined the paddy fields. Satellite: Such heavy rain is unusual during the month of May. Workshop on NLP - IIITH July 2014
Subject Matter Relations Non-Volitional Cause Constraints on either S or N individually: on N: N is not a volitional action. Constraints on N + S: S, by means other than motivating a volitional action, caused N; without the presentation of S, R might not know the particular cause of the situation; a presentation of N is more central than S to W's purposes in putting forth the N-S combination. Intention of W: R recognizes S as a cause of N Workshop on NLP - IIITH July 2014
Non-Volitional Cause Contd… • Example: My uncle is a chain smoker. That’s why he got lung cancer. Nucleus: He got lung cancer. Satellite: My uncle is a chain smoker. Workshop on NLP - IIITH July 2014
Subject Matter Relations Non-Volitional Result Constraints on either S or N individually: on S: S is not a volitional action Constraints on N + S: N caused S; presentation of N is more central to W's purposes in putting forth the N-S combination than is the presentation of S. Intention of W: R recognizes that N could have caused the situation in S Example: 1. The blast, the worst industrial accident in Mexico's history, destroyed the plant and most of the surrounding suburbs. 2. Several thousand people were injured, 3. and about 300 are still in hospital. Workshop on NLP - IIITH July 2014
Non-Volitional Result Contd… Nucleus: The blast, the worst industrial accident in Mexico's history, destroyed the plant and most of the surrounding suburbs. Satellite : Several thousand people were injured and about 300 are still in hospital. Workshop on NLP - IIITH July 2014
Subject Matter Relations Otherwise Constraints on either S or N individually: on N: N is an unrealized situation on S: S is an unrealized situation Constraints on N + S: realization of N prevents realization of S Intention of W: R recognizes the dependency relation of prevention between the realization of N and the realization of S Example: 1. Students should submit their assignments by tomorrow. 2. Otherwise, their, internal marks will be reduced. Workshop on NLP - IIITH July 2014
Otherwise contd… Nucleus: Students should submit their assignments by tomorrow. Satellite : Otherwise, their, internal marks will be reduced. Workshop on NLP - IIITH July 2014
Subject Matter Relations Purpose Constraints on either S or N individually: on N: N is an activity; on S: S is a situation that is unrealized Constraints on N + S: S is to be realized through the activity in N Intention of W: R recognizes that the activity in N is initiated in order to realize S. Example: 1. I came out of home. 2. to see if it is raining Nucleus : I came out of home Satellite: to see if it is raining Workshop on NLP - IIITH July 2014
Subject Matter Relations Solutionhood Constraints on either S or N individually: on S: S presents a problem Constraints on N + S: N is a solution to the problem presented in S Intention of W: R recognizes N as a solution to the problem presented in S. Example: 1.It is always difficult to catch an auto in this city 2. I need to learn driving. Nucleus : It is always difficult to catch an auto in this city Satellite: I need to learn driving. Workshop on NLP - IIITH July 2014