190 likes | 203 Views
Explore the evolution of SSML through STML, JSML, SABLE, and W3C SSML, focusing on elements, extensions, prosody, and expressive synthesis.
E N D
Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Brief Introduction to Evolution of SSML • The original SSML (not W3C SSML) • STML • JSML • SABLE • W3C SSML • … National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
The original SSML • Mark phrase boundaries • Emphasis words • Specify pronunciations • Include other sound files National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
STML • Developed by Edinburgh and Bell Labs • Based on the original SSML • Aimed at giving the same basic impressions to listeners, not sounding identical on different systems National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
JSML • Developed by Sun • XML based • Include • Elements to mark the paragraphs and sentences • Elements to control the pronunciations • Elements to represent markers National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
SABLE • Developed by Edinburgh and Bell Labs • Based on STML and JSML • The stated aims • Synthesizer control • Text structure • Speech pronunciation • Multilinguality • Easy of Use • Portable • Extensibility National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
W3C SSML • Key design criteria • Consistency • Interoperability • Generality • Internationalization • Generation and Readability • Implementable National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
What we want from markup language • Controlling • Sharing • Extended to multimedia National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Which level we should focus • Text analysis module • Prosody module • Acoustic module
Data Structure1 Data Structure2 Text-analysis Prosody-analysis acoustic Sharing Text-analysis Prosody-analysis acoustic Sys1 SSML SSML Sys2 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Text level for Mandarin • Word boundary • Pronunciation with tone • POS • Dialect?
Prosody level for Mandarin • Tone sandhi • Rhythm ?
Extensions to expressive synthesis • Emotion and Style • Others National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Current elements related to prosody and style in SSML • 3.2.1 "voice" Element • 3.2.2 "emphasis" Element • 3.2.3 "break" Element • 3.2.4 "prosody" Element
Emotion and Style • Emotion • Anger, happy, surprise, sad, fear, … • Depend on speaker’s psychological and physical states • Local effects on prosody • Style • News, comments, … • Depend on semantics of sentences • Global effects on prosody
Personalized Voice • Element:voice • “gender”: • “age”: • “name”: • “variant”: • sample: • 他说:<voice gender=”male”>“什么意思?”</voice> • 她回答:<voice gender=”female”>“没什么意思。”</voice>
Extension? • To make it more expressive • Background music • VTTS • Combined with talking head and some other media information • … • We only can see the element “mark“ National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Element: <Structure> • Level: 0-..; paragraph, phrase, • POS: • <Structure:level=paragraph> • <Structure:level=sentence> • <Structure:level=phrase> • <Structure:level=word>