1 / 28

Meaningful Intonational Variation

Explore the assignment of meaningful intonational variation in TTS and CTS systems, covering contours, accent, phrasing, pitch range, and more. Discover the challenges and advancements in producing natural-sounding speech.

ctse
Download Presentation

Meaningful Intonational Variation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meaningful Intonational Variation

  2. Today • Assigning variation for TTS, CTS • Contours • Accent • Phrasing • Pitch Range • Amplitude and timing

  3. TTS Production Pipeline • Orthographic input: Dr. Smith lives on Elm Dr. • Text normalization: abbreviation expansion… • Pronunciation modeling: POS id, WS disambiguation • Intonation assignment: parsing, POS id, robust semantics… • Phonetic/phonological realization: phonological parsing, phonetic analysis • Unit selection: acoustic analysis

  4. Intonation Assignment: Phrasing • Traditional: hand-built rules • Punctuation 234-5682 • Context/function word: no breaks after function word He went to dinner • Parse? She favors the nuts and bolts approach • Current: statistical analysis of large labeled corpus • Punctuation, pos window, utt length,…

  5. Functions of Phrasing • Disambiguates syntactic constructions, e.g. PP attachment: • S: You should buy the ticket with the discount coupon. • Disambiguates scope ambiguities, e.g. Negation: • S: You aren’t booked through Rome because of the fare. • Or modifier scope: • S: This fare is restricted to retired politicians and civil servants.

  6. Intonation Assignment: Accent • Hand-built rules • Function/content distinction He went out the back door/He threw out the trash • Complex nominals: • Main Street/Park Avenue • city hall parking lot • Statistical procedures trained on large corpora • Contrastive stress, given/new distinction?

  7. Functions of Pitch Accent • Given/new information • S: Do you need a return ticket. • U: No, thanks, I don’t need a return. • Contrast (narrow focus) • U: No, thanks, I don’t need a RETURN…. (I need a time schedule, receipt,…) • Disambiguation of discourse markers • S: Now let me get you the train information. • U: Okay (thanks) vs. Okay….(but I really want…)

  8. Intonation Assignment: Contours • Simple rules • ‘.’ = declarative contour • ‘?’ = yes-no-question contour unless wh-word present at/near front of sentence • Well, how did he do it? And what do you know? • What else might we do?

  9. Contours: Accent + Phrasing • What do intonational contours ‘mean’ (Ladd ‘80, Bolinger ‘89)? • Speech acts (statements, questions, requests) S: That’ll be credit card? (L* H- H%) • Propositional attitude (uncertainty, incredulity) S: You’d like an evening flight.(L*+H L- H%) • Speaker affect (anger, happiness, love) U: I said four SEVEN one! (L+H* L- L%) • “Personality” S: Welcome to the Sunshine Travel System.

  10. Propositional attitude (uncertainty) Did you feed the animals? I fed the L*+H goldfish L-H% • Distinguish direct/indirect speech acts • Can you open the door?

  11. The TTS Front End Today • Corpus-based statistical methods instead of hand-built rule-sets • Dictionaries instead of rules (but fall-back to rules) • Modest attempts to infer contrast, given/new • Text analysis tools: pos tagger, morphological analyzer, little parsing

  12. TTS: Where are we now? • Natural sounding speech for some utterances • Where good match between input and database • Still…hard to vary prosodic features and retain naturalness • Yes-no questions: Do you want to fly first class? • Context-dependent variation still hard to infer from text and hard to realize naturally:

  13. Appropriate contours from text • Emphasis, de-emphasis to convey focus, given/new distinction: I own a cat. Or, rather, my cat owns me. • Variation in pitch range, rate, pausal duration to convey topic structure • Characteristics of ‘emotional speech’ little understood, so hard to convey: …a voice that sounds friendly, sympathetic, authoritative…. • How to mimic real voices?

  14. TTS vs. CTS • Decisions in Text-to-Speech (TTS) depend on syntax, information status, topic structure,… information explicitly available to NLG • Concept-to-Speech (CTS) systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how • But….generating prosody for CTS isn’t so easy

  15. To(nes and)B(reak)I(ndices) • Developed by prosody researchers in four meetings over 1991-94 • Goals: • devise common labeling scheme for Standard American English that is robust and reliable • promote collection of large, prosodically labeled, shareable corpora • ToBI standards also proposed for Japanese, German, Italian, Spanish, British and Australian English,....

  16. Minimal ToBI transcription: • recording of speech • f0 contour • ToBI tiers: • orthographic tier: words • break-index tier: degrees of junction (Price et al ‘89) • tonal tier: pitch accents, phrase accents, boundary tones (Pierrehumbert ‘80) • miscellaneous tier: disfluencies, non-speech sounds, etc.

  17. Sample ToBI Labeling

  18. Online training material,available at: • http://www.ling.ohio-state.edu/phonetics/ToBI/ • Evaluation • Good inter-labeler reliability for expert and naive labelers: 88% agreement on presence/absence of tonal category, 81% agreement on category label, 91% agreement on break indices to within 1 level (Silverman et al. ‘92,Pitrelli et al ‘94)

  19. Pitch Accent/Prominence in ToBI • Which items are made intonationally prominent and how? • Accent type: • H* simple high (declarative) • L* simple low (ynq) • L*+H scooped, late rise (uncertainty/ incredulity) • L+H* early rise to stress (contrastive focus) • H+!H* fall onto stress (implied familiarity)

  20. Downstepped accents: • !H*, • L+!H*, • L*+!H • Degree of prominence: • within a phrase: HiF0 • across phrases

  21. Prosodic Phrasing in ToBI • ‘Levels’ of phrasing: • intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) • intonational phrase: 1 or more intermediate phrases + boundary tone (H% or L% ) • ToBI break-index tier • 0 no word boundary • 1 word boundary • 2 strong juncture with no tonal markings • 3 intermediate phrase boundary • 4 intonational phrase boundary

  22. L-L% L-H% H-L% H-H% H* L* L*+H

  23. L-L% L-H% H-L% H-H% L+H* H+!H* H* !H*

  24. Contour Examples • http://www.cs.columbia.edu/~julia/cs6998/cards/examples.html

  25. And Other Things Contribute: Pitch Range and Timing (Rate, Pause) • Level of speaker engagement Hello vs. HELLO • Contour interpretation Rise/fall/rise (L*+H L-H%): Elephantiasis isn’t incurable • Discourse/topic structure: paratones

  26. Corpus-Based Research • Predicting accent, phrasing, contours from large ToBI-labeled corpora • Features: • Word position, p.o.s. window, word cooccurence, punctuation, capitalization, sentence length, paragraph position, … • Results: • ~80-85% correct accent prediction • ~92-96% correct phrase boundary prediction • Contours???? • Reality…

  27. This is my version of a rather long sentence which ideally should be broken into several phrases automatically by a smart system but we don't know if this will actually happen do we? • Is a yes-no question uttered with falling intonation? Does that sound delightful? Mellifluous? • I don’t want cereal I want toast. • ….

  28. Next: • Story analysis and generation (readings will be available later this week – we’ll send mail)

More Related