Speech Synthesis Markup Language -----Aim at Extension

Speech Synthesis Markup Language -----Aim at Extension Dr. Jianhua Tao National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Brief Introduction to Evolution of SSML • The original SSML (not W3C SSML) • STML • JSML • SABLE • W3C SSML • … National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

The original SSML • Mark phrase boundaries • Emphasis words • Specify pronunciations • Include other sound files National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

STML • Developed by Edinburgh and Bell Labs • Based on the original SSML • Aimed at giving the same basic impressions to listeners, not sounding identical on different systems National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

JSML • Developed by Sun • XML based • Include • Elements to mark the paragraphs and sentences • Elements to control the pronunciations • Elements to represent markers National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

SABLE • Developed by Edinburgh and Bell Labs • Based on STML and JSML • The stated aims • Synthesizer control • Text structure • Speech pronunciation • Multilinguality • Easy of Use • Portable • Extensibility National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

W3C SSML • Key design criteria • Consistency • Interoperability • Generality • Internationalization • Generation and Readability • Implementable National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

What we want from markup language • Controlling • Sharing • Extended to multimedia National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Which level we should focus • Text analysis module • Prosody module • Acoustic module

Data Structure1 Data Structure2 Text-analysis Prosody-analysis acoustic Sharing Text-analysis Prosody-analysis acoustic Sys1 SSML SSML Sys2 National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Text level for Mandarin • Word boundary • Pronunciation with tone • POS • Dialect?

Prosody level for Mandarin • Tone sandhi • Rhythm ?

Extensions to expressive synthesis • Emotion and Style • Others National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Current elements related to prosody and style in SSML • 3.2.1 "voice" Element • 3.2.2 "emphasis" Element • 3.2.3 "break" Element • 3.2.4 "prosody" Element

Emotion and Style • Emotion • Anger, happy, surprise, sad, fear, … • Depend on speaker’s psychological and physical states • Local effects on prosody • Style • News, comments, … • Depend on semantics of sentences • Global effects on prosody

Personalized Voice • Element：voice • “gender”： • “age”： • “name”： • “variant”： • sample： • 他说：<voice gender=”male”>“什么意思？”</voice> • 她回答：<voice gender=”female”>“没什么意思。”</voice>

Extension? • To make it more expressive • Background music • VTTS • Combined with talking head and some other media information • … • We only can see the element “mark“ National Laboratory of Pattern Recognition (NLPR) Institute of Automation, Chinese Academic of Sciences

Thanks!

Element: <Structure> • Level: 0-..; paragraph, phrase, • POS: • <Structure:level=paragraph> • <Structure:level=sentence> • <Structure:level=phrase> • <Structure:level=word>

Speech Synthesis Markup Language -----Aim at Extension

Speech Synthesis Markup Language -----Aim at Extension

Presentation Transcript

Annotation and Image Markup : Take AIM at Images!

Speech synthesis

Strategy Markup Language

Speech Synthesis Markup Language -----Aim at Extension

Speech Synthesis

Hypertext markup language

Speech Synthesis

HyperText Markup Language

FREESTYLE MARKUP LANGUAGE

SSML 1.1 - The Internationalization of the W3C Speech Synthesis Markup Language

Wireless Markup Language

Speech Synthesis

Extensible Markup Language

Hypertext Markup Language

Wireless Markup Language

Extensible Markup Language