1 / 19

Speech Synthesis Markup Language -----Aim at Extension

Explore the evolution of SSML through STML, JSML, SABLE, and W3C SSML, focusing on elements, extensions, prosody, and expressive synthesis.

tthayer
Download Presentation

Speech Synthesis Markup Language -----Aim at Extension

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  2. Brief Introduction to Evolution of SSML • The original SSML (not W3C SSML) • STML • JSML • SABLE • W3C SSML • … National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  3. The original SSML • Mark phrase boundaries • Emphasis words • Specify pronunciations • Include other sound files National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  4. STML • Developed by Edinburgh and Bell Labs • Based on the original SSML • Aimed at giving the same basic impressions to listeners, not sounding identical on different systems National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  5. JSML • Developed by Sun • XML based • Include • Elements to mark the paragraphs and sentences • Elements to control the pronunciations • Elements to represent markers National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  6. SABLE • Developed by Edinburgh and Bell Labs • Based on STML and JSML • The stated aims • Synthesizer control • Text structure • Speech pronunciation • Multilinguality • Easy of Use • Portable • Extensibility National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  7. W3C SSML • Key design criteria • Consistency • Interoperability • Generality • Internationalization • Generation and Readability • Implementable National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  8. What we want from markup language • Controlling • Sharing • Extended to multimedia National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  9. Which level we should focus • Text analysis module • Prosody module • Acoustic module

  10. Data Structure1 Data Structure2 Text-analysis Prosody-analysis acoustic Sharing Text-analysis Prosody-analysis acoustic Sys1 SSML SSML Sys2 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  11. Text level for Mandarin • Word boundary • Pronunciation with tone • POS • Dialect?

  12. Prosody level for Mandarin • Tone sandhi • Rhythm ?

  13. Extensions to expressive synthesis • Emotion and Style • Others National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  14. Current elements related to prosody and style in SSML • 3.2.1 "voice" Element • 3.2.2 "emphasis" Element • 3.2.3 "break" Element • 3.2.4 "prosody" Element

  15. Emotion and Style • Emotion • Anger, happy, surprise, sad, fear, … • Depend on speaker’s psychological and physical states • Local effects on prosody • Style • News, comments, … • Depend on semantics of sentences • Global effects on prosody

  16. Personalized Voice • Element:voice • “gender”: • “age”: • “name”: • “variant”: • sample: • 他说:<voice gender=”male”>“什么意思?”</voice> • 她回答:<voice gender=”female”>“没什么意思。”</voice>

  17. Extension? • To make it more expressive • Background music • VTTS • Combined with talking head and some other media information • … • We only can see the element “mark“ National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

  18. Thanks!

  19. Element: <Structure> • Level: 0-..; paragraph, phrase, • POS: • <Structure:level=paragraph> • <Structure:level=sentence> • <Structure:level=phrase> • <Structure:level=word>

More Related