200 likes | 367 Views
Speech Synthesis Markup Language -----Aim at Extension. Dr. Jianhua Tao. National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences. Brief Introduction to Evolution of SSML. The original SSML (not W3C SSML) STML JSML SABLE W3C SSML ….
E N D
Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Brief Introduction to Evolution of SSML • The original SSML (not W3C SSML) • STML • JSML • SABLE • W3C SSML • … National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
The original SSML • Mark phrase boundaries • Emphasis words • Specify pronunciations • Include other sound files National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
STML • Developed by Edinburgh and Bell Labs • Based on the original SSML • Aimed at giving the same basic impressions to listeners, not sounding identical on different systems National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
JSML • Developed by Sun • XML based • Include • Elements to mark the paragraphs and sentences • Elements to control the pronunciations • Elements to represent markers National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
SABLE • Developed by Edinburgh and Bell Labs • Based on STML and JSML • The stated aims • Synthesizer control • Text structure • Speech pronunciation • Multilinguality • Easy of Use • Portable • Extensibility National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
W3C SSML • Key design criteria • Consistency • Interoperability • Generality • Internationalization • Generation and Readability • Implementable National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
What we want from markup language • Controlling • Sharing • Extended to multimedia National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Which level we should focus • Text analysis module • Prosody module • Acoustic module
Data Structure1 Data Structure2 Text-analysis Prosody-analysis acoustic Sharing Text-analysis Prosody-analysis acoustic Sys1 SSML SSML Sys2 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Text level for Mandarin • Word boundary • Pronunciation with tone • POS • Dialect?
Prosody level for Mandarin • Tone sandhi • Rhythm ?
Extensions to expressive synthesis • Emotion and Style • Others National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Current elements related to prosody and style in SSML • 3.2.1 "voice" Element • 3.2.2 "emphasis" Element • 3.2.3 "break" Element • 3.2.4 "prosody" Element
Emotion and Style • Emotion • Anger, happy, surprise, sad, fear, … • Depend on speaker’s psychological and physical states • Local effects on prosody • Style • News, comments, … • Depend on semantics of sentences • Global effects on prosody
Personalized Voice • Element:voice • “gender”: • “age”: • “name”: • “variant”: • sample: • 他说:<voice gender=”male”>“什么意思?”</voice> • 她回答:<voice gender=”female”>“没什么意思。”</voice>
Extension? • To make it more expressive • Background music • VTTS • Combined with talking head and some other media information • … • We only can see the element “mark“ National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences
Element: <Structure> • Level: 0-..; paragraph, phrase, • POS: • <Structure:level=paragraph> • <Structure:level=sentence> • <Structure:level=phrase> • <Structure:level=word>