An Introduction to S3ML

Learn about SinoVoice SSML for Chinese TTS, PinYin support, phoneme annotation, and domain-specific voice selection. Explore extensions, examples, and compatibility with SSML.

Presentation Transcript

  1. An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin

  2. Outline • Background • PinYin Support • <say-as> Definition • Domain Support • Conclusion

  3. Background • SSML • Speech Synthesis Markup Language • http://www.w3.org/TR/speech-synthesis/ • Now is W3C Recommendation • SinoVoice • Famous Speech Technology and Service Provider • Leading Chinese TTS Technology and Product • Deploy 1000+ Real Systems

  4. Background • S3ML (SinoVoice SSML) • Since the launching of jTTS 4.0, March 2004 • Based on SSML Specification • Defines some extensions aiming at Chinese TTS • Defines the detail of some elements which SSML does not define precisely • Provide maximum compatibility with newest SSML version

  5. PinYin Support • PinYin • Phoneme annotation method for Chinese characters • <phoneme> in SSML • The phoneme element provides a phonemic/phonetic pronunciation for the contained text. • Two attributes: alphabet and ph

  6. PinYin Support • alphabet • The alphabet attribute is an optional attribute that specifies the phonemic/phonetic alphabet. • Use ‘py’ as value of ‘alphabet’ to specify that PinYin will be used • ph • The ph attribute is a required attribute that specifies the phoneme/phone string. • Use PinYin string as value of ‘ph’

  7. PinYin Support • Example • More about PinYin string • Conformed to “Chinese Mandarin PinYin Specification“ • Series of PinYin for several characters • Tone information • 1~4: high flat, rising, diving and falling tone • 0, 5: light tone • <phoneme alphabet="py" ph="zha1">查</phoneme>良镛 • <phoneme alphabet="py" ph="zha1 liang2yong1">查良镛</phoneme>先生

  8. PinYin support • When PinYin string is included in normal text? • Comparing with CSSML • We think <phoneme> is not for such purpose, <say-as> is more suitable • We think <phoneme> extension in S3ML is more compatible with SSML • Next station is <say-as interpret-as="phoneme" format="py"> • di4 tan2</say-as> • <phoneme lang=“zh-cn”>zha1</phoneme>良镛 • 他姓<phoneme py=“zha1”>查</phoneme>

  9. <say-as> Definition • The detail of <say-as> element • When SinoVoice define S3ML, the detail values of the attributes of this element is not defined in SSML. • Now, “SSML 1.0 say-as attribute values” is proposed but it is still on progress • http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/ • SinoVoice will support this proposal, so I will only talk about some additional values

  10. <say-as> Definition • Name and address, especially person name because of the polyphone Chinese characters • Math, some mathematic expression is confused with other info • <say-as interpret-as=“name” format=“person”>张朝阳</say-as> • <say-as interpret-as=“address”>朝阳区</say-as> • <say-as interpret-as=“math” >2005-12-13</say-as> • <say-as interpret-as=“math”>+8610-62972997</say-as>

  11. <say-as> Definition • Net address • Phoneme, useful for character/phoneme mixed text • <say-as interpret-as="net" format="email">abc@xyz.com</say-as> • <say-as interpret-as="net" format="url"> http://www.sinovoice.com.cn • </say-as> • The pronunciation of ‘tomato’ is • <say-as interpret-as="phoneme" format="ipa"> • t&#x252;m&#x251;to&#x28A;</say-as> • Next station is <say-as interpret-as="phoneme" format="py"> • di4 tan2</say-as>

  12. Domain Support • Important for real system • Customized TTS is used more and more popular • Better voice quality than general version • One possibility in SSML • Use <voice> element and define special values of ‘name’ attribute • But it is not natural because it is normal to support several different domains by using a same name (voice library)

  13. Domain Support • <domain> element • The ‘name’ attribute is required to specify the customized TTS package used • The value of ‘name’ attribute will be a vendor-specific name • <domain> will not change voice • If a voice library does not support this domain, this element will be just ignored.

  14. Domain Support • If we want TTS System select the best voice for this domain automatically • Extended ‘domain’ attribute of <voice> • ‘domain’ is still in least priority • <domain name=“weather”> • 今天白天 ,晴转多云,最高温度26度 • </domain> • <voice domain=“weather”> • 今天白天 ,晴转多云,最高温度26度 • </voice>

  15. Conclusion • Summarize extension of S3ML • <phoneme alphabet=“py” ph=“…”> • <say-as interpret-as=“...”> • name / address / math / phoneme / net • <domain name=“…”> • <voice domain=“…”> • We hope it will be helpful to define the standard for internationalizing SSML

  16. Thank You!

