240 likes | 254 Views
Intonational and Its Meanings. Julia Hirschberg CS 6998. What do speech researchers do?. Study human production and perception Try to embody it in machines Production: TTS, CTS Perception: ASR, ASRU, speaker ID, language ID. Pitch Accent/Prominence in ToBI.
E N D
Intonational and Its Meanings Julia Hirschberg CS 6998
What do speech researchers do? • Study human production and perception • Try to embody it in machines • Production: TTS, CTS • Perception: ASR, ASRU, speaker ID, language ID
Pitch Accent/Prominence in ToBI • Which items are made intonationally prominent and how? • Accent type: • H* simple high (declarative) • L* simple low (ynq) • L*+H scooped, late rise (uncertainty/ incredulity) • L+H* early rise to stress (contrastive focus) • H+!H* fall onto stress (implied familiarity)
Downstepped accents: • !H*, • L+!H*, • L*+!H • Degree of prominence: • within a phrase: HiF0 • across phrases
Functions of Pitch Accent • Given/new information • S: Do you need a return ticket. • U: No, thanks, I don’t need a return. • Contrast (narrow focus) • U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) • Disambiguation of discourse markers • S: Now let me get you the train information. • U: Okay (thanks) vs. Okay….(but I really want…)
Prosodic Phrasing in ToBI • ‘Levels’ of phrasing: • intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) • intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) • ToBI break-index tier • 0 no word boundary • 1 word boundary • 2 strong juncture with no tonal markings • 3 intermediate phrase boundary • 4 intonational phrase boundary
Functions of Phrasing • Disambiguates syntactic constructions, e.g. PP attachment: • S: You should buy the ticket with the discount coupon. • Disambiguates scope ambiguities, e.g. Negation: • S: You aren’t booked through Rome because of the fare. • Or modifier scope: • S: This fare is restricted to retired politicians and civil servants.
L-L% L-H% H-L% H-H% H* L* L*+H
L-L% L-H% H-L% H-H% L+H* H+!H* H* !H*
Contour Examples • http://www.cs.columbia.edu/~julia/cs6998/cards/examples.html
Contours: Accent + Phrasing • What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? • Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) • Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight.(L*+H L- H%) • Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) • “Personality” S: Welcome to the Sunshine Travel System.
Propositional attitude (uncertainty) Did you feed the animals? I fed the L*+H goldfish L-H% • Distinguish direct/indirect speech acts • Can you open the door?
And Other Things Contribute: Pitch Range and Timing (Rate, Pause) • Level of speaker engagement Hello vs. HELLO • Contour interpretation Rise/fall/rise (L*+H L-H%): Elephantiasis isn’t incurable • Discourse/topic structure: paratones
Prosodic Generation for TTS • Corpus-based approaches • Train prosodic variation on large labeled corpora using machine learning techniques • Accent and phrasing decisions • Associate prosodic labels with simple features of transcripts • To do: • Contour variation
Timing and backchanneling • Disfluencies? • Emotion and ‘personality’ • Personalized voices
Concept to Speech • Decisions in TTS depend on text analysis • Concept-to-Speech (CTS) systems should be able to do better • System knows what it wants to say and can specify how • But…. • Still need labeled corpora to train on • CTS features may be hard to label (focus, given/new,…) • How to decide how to realize these?
Prosody in ASRU • Little success in improving ASRtranscription • More promise in other areas: • Improving rejection • Shrinking search space • Automatic topic segmentation for browsing/retrieval • Identifying ‘salient’ words in turns • Disambiguating speech/dialogue acts: okay
Recognizing communicative ‘problems’ • ASR errors • User corrections • ‘Aware’ turns • ‘Problematic’ dialogues • Disfluencies and self-repairs • Recognizing speaker emotion
Some Research Topics • Meaning of intonational contours: • Rise/fall/rise (L*+H L-H%) A: Did you take out the garbage? B: Sort of. A: Sort of! • High rise questions (H* H-H%) This is the chicken Chermula? I’m from Skokie?
Compositional theory of intonational meaning (w/Pierrehumbert) • Intonational disambiguation across languages: Spanish, Italian and English (w/Avesani & Prieto) William isn’t drinking because he’s unhappy • Disfluencies: self-repairs (w/Nakatani) I want to go to Ba- Baltimore. • Cue phrases (w/Litman) • Now let’s go to work.
Accent and strict/sloppy interpretations of ellipsis (w/Ward) People who live in Los Angeles adore it’s beaches and so do people who live in New York
Accent and given/new (w/Terken) • The ball touches the circle. • The ball touches the triangle. • The ball touches the cone. • The square touches the ball. • Intonation and discourse structure (w/Grosz & Nakatani) • Boston Directions Corpus • Automatic assignment of accent and phrasing for TTS (w/Wang, Sproat, Koehn, Abney, Collins, Rambow)
ToBI prosodic labeling conventions w/many) • Prosody in dialogue systems (w/Litman & Swerts): generation and understanding (TOOT) • Audio browsing and retrieval: SCAN and SCANMail (w/many)
Potential Projects • Build a TTS system in a limited domain • Build a speech recognizer • Study a speech phenomenon (disfluencies, accenting, contours, pitch range variation) • Do some experiments (production, perception). • Examples: • Speech summarization, eye tracking and emotion, deceptive speech, given/new and contour,….