200 likes | 307 Views
Lecture 22. Intonation and Discourse. What does prosody convey?. In general, information about: What the speaker is trying to convey Is this a statement or a question? The speaker state Is the speaker getting angry, frustrated? In dialogue, information about:
E N D
Lecture 22 Intonation and Discourse CS 4705
What does prosody convey? • In general, information about: • What the speaker is trying to convey • Is this a statement or a question? • The speaker state • Is the speaker getting angry, frustrated? • In dialogue, information about: • The structure of the dialogue • Is the user or the system trying to start a new topic? • Is the speaker talking about given or new information? • The state of the interaction: • Is the user having trouble being understood? • Is the user having trouble understanding the system?
Current Trends • New description schemes (e.g. ToBI) • Corpus-based research and machine learning • Emphasis on evaluation of algorithms and systems (NLE ‘00 special issue) • Investigation of spontaneous speech phenomena and variation in speaking style • Applications to CTS, ASR and SDS
Corpora • Public and semi-public databases • ATIS, SwitchBoard, Call Home, Meetings (NIST/DARPA/LDC) • TRAINS/TRIPS (U. Rochester), FM Radio (BU), BDC (Harvard, AT&T) • Private collections • Acquired for speech or dialogue research (August, KTH; Voicemail, AT&T, IBM) • Meetings, call centers, operator services, focus group collections • The Web • Newscasts, radio
To(nes and)B(reak)I(ndices) • Developed by prosody researchers in four meetings over 1991-94 • Goals: • devise common labeling scheme for Standard American English that is robust and reliable • promote collection of large, prosodically labeled, shareable corpora • ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....
Minimal ToBI transcription: • recording of speech • f0 contour • ToBI tiers: • orthographic tier: words • break-index tier: degrees of junction (Price et al ‘89) • tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) • miscellaneous tier: disfluencies, non-speech sounds, etc.
Online training material,available at: • http://www.ling.ohio-state.edu/phonetics/ToBI/ • Evaluation • Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)
Pitch Accent/Prominence in ToBI • Which items are made intonationally prominent and how? • Accent type: • H* simple high (declarative) • L* simple low (ynq) • L*+H scooped, late rise (uncertainty/ incredulity) • L+H* early rise to stress (contrastive focus) • H+!H* fall onto stress (implied familiarity)
Downstepped accents: • !H*, L+!H*, L*+!H • Degree of prominence: • within a phrase: HiF0 • across phrases
Functions of Pitch Accent • Given/new information • S: Do you need a return ticket? • U: No, thanks, I don’t need a return. • Contrast (narrow focus) • U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) • Disambiguation of discourse markers • S: Now let me get you the train information. • U: Okay (thanks) vs. Okay….(but I really want…)
Predicting Accent: Is it accented or not? • Applications: TTS and CTS • Corpora: read and spontaneous speech • Features: pos window of 3, sentence position, position within NP, # of syllables, position in complex nominal, inferred given/new status, inferred focus, mutual information • Results: 75-85% correct, depending on genre
Prosodic Phrasing in ToBI • ‘Levels’ of phrasing: • intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) • intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) • ToBI break-index tier • 0 no word boundary • 1 word boundary • 2 strong juncture with no tonal markings • 3 intermediate phrase boundary • 4 intonational phrase boundary
Functions of Phrasing • Disambiguates syntactic constructions, e.g. PP attachment, restrictive/non relative clause: • S: You should buy the ticket with the discount coupon. • S: The itinerary which I faxed includes deluxe accommodations • Disambiguates scope ambiguities, e.g. Negation: • S: You aren’t booked through Rome because of the fare. • Or modifier scope: • S: This fare is restricted to retired politicians and civil servants.
Predicting Phrase Boundaries • Applications: TTS, CTS, ASR • Corpora: AP news, Penn Treebank, ATIS • Features: sentence position, sentence length, pos window of 4, location of previous predicted boundary, mutual information, constituent information, dependency structure • Results: 96% correct
Contours: Accent + Phrasing • What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? • Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) • Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight.(L*+H L- H%) • Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) • “Personality” S: Welcome to the Sunshine Travel System.
Pitch Range and Timing • Level of speaker engagement • S: Welcome to InfoTravel. How may I help you? • Contour interpretation • S: You can take the L*+H bus from Malpensa to Rome L-H%. • U: Take the bus. vs. Take the bus! • Discourse/topic structure • Topic beginnings have higher pitch range, faster, preceded by longer pauses • Endings the opposite
Prosody and Speaker Emotion • What makes an utterance sound angry? Sad? • How much comes from the lexical information? • How much from the acoustic/prosodic? • Does all anger, e.g., sound the same? • Cahn ‘88 (examples)
Applications • Text-to-Speech and Concept-to-Speech generation: improve naturalness • Speech Recognition: identify suprasegmental meaning • Spoken Dialogue Systems: understand when people are confused, angry • Audio Browsing: format corpora for browsing and search
Challenges • We don’t really know what most contours ‘mean’ • Our accent prediction needs more sensitivity to better model of given/new, focus, grammatical function • Our phrasing prediction needs better information about e.g. attachment • We don’t know much about emotional speech or ‘personality’ -- critical to applications