1 / 25

Discourse Structure in Generation

This discourse examines the identification and generation of discourse structure in text-to-speech systems. It explores the use of textual and spoken cues, as well as the potential for automatically identifying discourse structure from speech. The theory of discourse structure proposed by Grosz and Sidner in 1986 is also discussed, focusing on linguistic structure, intentional structure, and attentional structure. The limitations of the theory are also acknowledged.

njanney
Download Presentation

Discourse Structure in Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discourse Structure in Generation Julia Hirschberg CS 4706

  2. Today • Models of Discourse Structure • Do we have them? • Grosz & Sidner ’86 • What identifies discourse structure to Hearers? • Textual cues • Spoken cues • How can we produce appropriate discourse structure in TTS systems? • Can we identify discourse structure automatically, from speech?

  3. Is there structure in this discourse? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.

  4. Is this a reasonable structure? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.

  5. This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.

  6. This? A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.

  7. What information do we use in segmenting a discourse? • ‘Topic’ coherence? • Repeated reference? • ‘Cue’ phrases? • ????

  8. Structures of Discourse Structure (Grosz & Sidner ‘86) • A leading theory of discourse structure • Based upon Speaker intentions and Speaker and Hearer attentional state • Identifies a few, general relations that hold among Speaker intentions • Identifies a model of attentional state • Three components: • Linguistic structure • Intentional structure • Attentional structure

  9. Linguistic Structure • What is actually said or written • How is the linguistic structure represented? • Assume discourse is segmented into Discourse Segments (DS) • What is the basic unit of analysis? • Do we all segment alike? • Do we all use the same cues?

  10. Linguistic Structure of Discourse D S1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. S2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.

  11. Intentional Structure • Discourse purpose (DP): basic purpose of the Speaker in producing the discourse • Discourse segment purposes (DSPs): the Speaker’s purpose in producing the segment • Segments are related to one another by their purposes: • Satisfaction-precedence: DSP1 must be satisfied before DSP2 • Dominance: DSP1 dominates DSP2 if fulfilling DSP2 constitutes part of fulfilling DSP1

  12. Linguistic Structure of Discourse D DSP1: Describe murder of dove by duck. S1: A beautiful mallard spotted the dove I was feeding. The duck dove supply is small this year. That dove was history in a minute. DSP2: Describe meeting of old friend. S2: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. To my surprise, I ran into a friend from back home. When I told her of my recent experience she questioned my sanity.

  13. DSP2: Describe recovery process. S2: DSP3: Describe snack S3: Well, to recover from this horrible scene, I went to the park snack bar for a cup of cocoa. DSP3: Describe meeting old friend. S4: To my surprise, I ran into a friend from back home. DSP5: Describe friend’s reaction S5: When I told her of my recent experience she questioned my sanity.

  14. Attentional State: The Focus Stack • Stack of focus spaces, each containing objects, properties and relations salient during each DS, plus the DSP • State changes: transition rules controlling the addition/deletion of focus spaces • Information at lower levels may or may not be available at higher levels • Focus spaces are pushed onto the stack when • A new DS is begun

  15. An embedded DS (e.g. a DS dominated by another DS) is begun • Focus spaces are popped when they are completed • State of focus stack models felicitous reference, coherence in discourse S2: DSP2, scene, Speaker, snack_bar Cocoa, friend, home,sanity S1: DSP1, duck, dove, Speaker, duck_dove_supply

  16. Limits of the Theory • Assumes discourses are task-oriented • Assumes a single, hierarchical structure shared by S and H • Questions: • Do people really build such structures when they converse? • Use them in interpreting what others say? • How could they do it?

  17. How might people recognize discourse structure? • Linguistic markers? • tense and aspect • cue phrases • Inference of Speaker intentions? • Inference from task structure? • Intonational Information?

  18. Acoustic and Prosodic Cues to Discourse Structure • Intuition: • Speakers vary acoustic and prosodic cues to convey variation in discourse structure • Systematic? In read or spontaneous speech? • Evidence: • Observations from recorded corpora • Laboratory experiments • Machine learning of discourse structure from acoustic/prosodic features

  19. Prosodic Correlates of Discourse/Topic Structure • Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 • Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96

  20. Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 • Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92

  21. Issues • Do we find significant and reliable cues to discourse structure in prosodic variation • When tested against an independent theory of discourse structure? • In spontaneous as well as read speech? • Are Hearers interpretations of discourse structure influenced by intonational variation?

  22. Grosz & Hirschberg ‘92 • Small corpus of read AP newswire • Read by professional speaker • Labeled for discourse structure from text alone or from text and speech • Pre-ToBI labeled • Acoustic-prosodic features extracted for each intermediate (level 3) phrase • Pitch range and change from prior phrase • Intensity (rms) and change in db from prior phrase • Preceding and subsequent pause • Speaking rate

  23. Analysis of phrases in different segment positions: SBEG, SF, parentheticals, quoted speech • ANOVA’s and t-tests on means • Results: • Direct quotes: larger pitch range • Parentheticals: smaller range, neg change from prior phrase, neg change in db, faster rate • SBEG: larger range, louder, greater preceding pause, less subsequent pause • SF: greater subsequent pause

  24. Machine learning experiments identified: • SBEG with 91.5% est. accuracy (x-validation) • SF, 92.5% • Attributive tags, 96.9% • Direct quotations, 86.4% • Indirect quotations, 88.5% • Parentheticals, 89.2% • Conclusion: Acoustic/prosodic information is available to permit Hearers to identify discourse structure…

  25. Next • The midterm • Closed book, no notes or electronic devices • Will include material through today

More Related