1 / 25

Center for Spoken Language Understanding OGI School of Science & Technology at OHSU

PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu. Center for Spoken Language Understanding OGI School of Science & Technology at OHSU. OVERVIEW. IMPORTANCE OF SPECTRAL BALANCE MEASUREMENT OF SPECTRAL BALANCE ANALYSIS METHODS

Download Presentation

Center for Spoken Language Understanding OGI School of Science & Technology at OHSU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELSJan P.H. van Santen and Xiaochuan Niu Center for Spoken Language Understanding OGI School of Science & Technology at OHSU CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  2. OVERVIEW • IMPORTANCE OF SPECTRAL BALANCE • MEASUREMENT OF SPECTRAL BALANCE • ANALYSIS METHODS • RESULTS • SYNTHESIS • CONCLUSIONS CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  3. 1. IMPORTANCE OF SPECTRAL BALANCE • Linguistic Control Factors • Stress-like factors • Positional factors • Phonemic factors • Acoustic Correlates • Traditionally TTS-controlled: • Pitch, timing, amplitude • Demonstrated in natural speech, but usually not TTS-controlled: • Spectral tilt, balance • Formant dynamics • … CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  4. 2. MEASUREMENT OF SPECTRAL BALANCE • Data: • 472 greedily selected sentences • Genre: newspaper • Greedy features: linguistic control factors • One female speaker • Manual segmentation • Accent: independent rating by 3 judges • 0-3 score CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  5. 2. MEASUREMENT OF SPECTRAL BALANCE • Energy in 5 formant-range frequency bands • B0: 100-300 Hz [~F0] • B1: 300-800 Hz [~F1] • B2: 800-2500 Hz [~F2] • B3: 2500-3500 Hz [~F3] • B4: 3500- max Hz [~fricative noise] • In other words, multidimensional measure • Filter bank  Square   Average [1 ms rect.]  20 log10(Bi ) • Subtract estimated per-utterance means CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  6. 2. MEASUREMENT OF SPECTRAL BALANCE • Details: • Confounding with F0 • Measure pitch-corrected and raw • For certain wave shapes, pitch directly related to fixed-frame energy • Why do both: wave shapes may change in unknown ways • F0 not confined to B0 [female speech] • Vowel formants not quite confined to bands [e.g., F1 for /EE/ and F3 for /ER/] CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  7. 2. MEASUREMENT OF SPECTRAL BALANCE • Why not more or different bands? • Multiple interacting Linguistic Control Factors • Need measurements that minimize interactions • 5 bands  Different vowels “behave similarly” • Can model vowels as a class • Why not simply spectral tilt? • 5 bands more information than single measure • Supply more information for synthesis CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  8. 3. ANALYSIS METHODS • Measures likely to behave like segmental duration: • Multiple interacting, confounded factors: • Interaction: Magnitude of effects on one factor may depend on other factors • Confounding: Unequal frequencies of control factor combinations • “Directional Invariance” • Direction of effects on one factor independent of other factors CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  9. 3. ANALYSIS METHODS • Need method that • can handle multiple interacting, confounded factors and • takes advantage of Directional Invariance: • Used: Sums of Products Model: CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  10. 3. ANALYSIS METHODS • Special cases: • Multiplicative model: K = {1}, I1 = {0,…,n} • Additive model: K = {0,…,n}, Ii = {i} CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  11. 3. ANALYSIS METHODS • Used additive model • Note: Parameter estimates are: • Estimates of marginal means … • … in balanced design: CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  12. 3. ANALYSIS METHODS • Pitch correction: • Confounding with F0: Show both <B0, B1, B2, B3, B4> and: <B0 + B1, B2, B3, B4> CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  13. 4. RESULTS: (A) POSITIONAL EFFECTS 5 Bands, not pitch-corrected Solid: right position, dashed: left position. Y-axis: corrected mean CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  14. 4. RESULTS: (A) POSITIONAL EFFECTS 5 Bands, pitch-corrected CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  15. 4. RESULTS: (A) POSITIONAL EFFECTS 4 Bands, not pitch-corrected CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  16. 4. RESULTS: (A) POSITIONAL EFFECTS 4 Bands, pitch-corrected CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  17. 4. RESULTS: (B) STRESS/ACCENT EFFECTS 5 Bands, not pitch-corrected Solid: stressed syllable, dashed: unstressed. Y-axis: corrected mean CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  18. 4. RESULTS: (B) STRESS/ACCENT EFFECTS 5 Bands, pitch-corrected CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  19. 4. RESULTS: (B) STRESS/ACCENT EFFECTS 4 Bands, not pitch-corrected CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  20. 4. RESULTS: (B) STRESS/ACCENT EFFECTS 4 Bands, pitch-corrected CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  21. 4. RESULTS: (C) TILT EFFECTS CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  22. 5. SYNTHESIS • Use ABS/OLA sinusoidal model: s[n] = sum of overlapped short-time signal frames sk[n] sk[n] = sum of quasi-harmonic sinusoidal components: sk[n] SlAk,lcos(wk,l n + fk,l) • Each frame of unit is represented by a set of quasi-harmonic sinusoidal parameters; • Given the desired F0 contour, pitch shift is applied to the sinusoidal parameter component of the unit to obtain the target parameter Ak,l; CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  23. 5. SYNTHESIS • Considering the differences of prosody factors between original and target unit, band differences: • Transform the band difference into weights applying to the sinusoidal parameters: • ,when the j’th harmonic is located in the i'th band; • Spectral smoothing across unit boundaries. CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  24. 5. SYNTHESIS 5 Bands modification example [i:] CENTER FORSPOKEN LANGUAGE UNDERSTANDING

  25. CONCLUSIONS • Described simple methods for predicting and synthesizing spectral balance • But: Spectral balance is only one “non-standard acoustic correlate” • Others that remain to be addressed: • Spectral dynamics • Phase CENTER FORSPOKEN LANGUAGE UNDERSTANDING

More Related