1 / 24

Intonational and Its Meanings

Intonational and Its Meanings. Julia Hirschberg CS 6998. What do speech researchers do?. Study human production and perception Try to embody it in machines Production: TTS, CTS Perception: ASR, ASRU, speaker ID, language ID. Pitch Accent/Prominence in ToBI.

pwitt
Download Presentation

Intonational and Its Meanings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intonational and Its Meanings Julia Hirschberg CS 6998

  2. What do speech researchers do? • Study human production and perception • Try to embody it in machines • Production: TTS, CTS • Perception: ASR, ASRU, speaker ID, language ID

  3. Pitch Accent/Prominence in ToBI • Which items are made intonationally prominent and how? • Accent type: • H* simple high (declarative) • L* simple low (ynq) • L*+H scooped, late rise (uncertainty/ incredulity) • L+H* early rise to stress (contrastive focus) • H+!H* fall onto stress (implied familiarity)

  4. Downstepped accents: • !H*, • L+!H*, • L*+!H • Degree of prominence: • within a phrase: HiF0 • across phrases

  5. Functions of Pitch Accent • Given/new information • S: Do you need a return ticket. • U: No, thanks, I don’t need a return. • Contrast (narrow focus) • U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) • Disambiguation of discourse markers • S: Now let me get you the train information. • U: Okay (thanks) vs. Okay….(but I really want…)

  6. Prosodic Phrasing in ToBI • ‘Levels’ of phrasing: • intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) • intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) • ToBI break-index tier • 0 no word boundary • 1 word boundary • 2 strong juncture with no tonal markings • 3 intermediate phrase boundary • 4 intonational phrase boundary

  7. Functions of Phrasing • Disambiguates syntactic constructions, e.g. PP attachment: • S: You should buy the ticket with the discount coupon. • Disambiguates scope ambiguities, e.g. Negation: • S: You aren’t booked through Rome because of the fare. • Or modifier scope: • S: This fare is restricted to retired politicians and civil servants.

  8. L-L% L-H% H-L% H-H% H* L* L*+H

  9. L-L% L-H% H-L% H-H% L+H* H+!H* H* !H*

  10. Contour Examples • http://www.cs.columbia.edu/~julia/cs6998/cards/examples.html

  11. Contours: Accent + Phrasing • What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? • Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) • Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight.(L*+H L- H%) • Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) • “Personality” S: Welcome to the Sunshine Travel System.

  12. Propositional attitude (uncertainty) Did you feed the animals? I fed the L*+H goldfish L-H% • Distinguish direct/indirect speech acts • Can you open the door?

  13. And Other Things Contribute: Pitch Range and Timing (Rate, Pause) • Level of speaker engagement Hello vs. HELLO • Contour interpretation Rise/fall/rise (L*+H L-H%): Elephantiasis isn’t incurable • Discourse/topic structure: paratones

  14. Prosodic Generation for TTS • Corpus-based approaches • Train prosodic variation on large labeled corpora using machine learning techniques • Accent and phrasing decisions • Associate prosodic labels with simple features of transcripts • To do: • Contour variation

  15. Timing and backchanneling • Disfluencies? • Emotion and ‘personality’ • Personalized voices

  16. Concept to Speech • Decisions in TTS depend on text analysis • Concept-to-Speech (CTS) systems should be able to do better • System knows what it wants to say and can specify how • But…. • Still need labeled corpora to train on • CTS features may be hard to label (focus, given/new,…) • How to decide how to realize these?

  17. Prosody in ASRU • Little success in improving ASRtranscription • More promise in other areas: • Improving rejection • Shrinking search space • Automatic topic segmentation for browsing/retrieval • Identifying ‘salient’ words in turns • Disambiguating speech/dialogue acts: okay

  18. Recognizing communicative ‘problems’ • ASR errors • User corrections • ‘Aware’ turns • ‘Problematic’ dialogues • Disfluencies and self-repairs • Recognizing speaker emotion

  19. Some Research Topics • Meaning of intonational contours: • Rise/fall/rise (L*+H L-H%) A: Did you take out the garbage? B: Sort of. A: Sort of! • High rise questions (H* H-H%) This is the chicken Chermula? I’m from Skokie?

  20. Compositional theory of intonational meaning (w/Pierrehumbert) • Intonational disambiguation across languages: Spanish, Italian and English (w/Avesani & Prieto) William isn’t drinking because he’s unhappy • Disfluencies: self-repairs (w/Nakatani) I want to go to Ba- Baltimore. • Cue phrases (w/Litman) • Now let’s go to work.

  21. Accent and strict/sloppy interpretations of ellipsis (w/Ward) People who live in Los Angeles adore it’s beaches and so do people who live in New York

  22. Accent and given/new (w/Terken) • The ball touches the circle. • The ball touches the triangle. • The ball touches the cone. • The square touches the ball. • Intonation and discourse structure (w/Grosz & Nakatani) • Boston Directions Corpus • Automatic assignment of accent and phrasing for TTS (w/Wang, Sproat, Koehn, Abney, Collins, Rambow)

  23. ToBI prosodic labeling conventions w/many) • Prosody in dialogue systems (w/Litman & Swerts): generation and understanding (TOOT) • Audio browsing and retrieval: SCAN and SCANMail (w/many)

  24. Potential Projects • Build a TTS system in a limited domain • Build a speech recognizer • Study a speech phenomenon (disfluencies, accenting, contours, pitch range variation) • Do some experiments (production, perception). • Examples: • Speech summarization, eye tracking and emotion, deceptive speech, given/new and contour,….

More Related