160 likes | 331 Views
An Introduction to S3ML. Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin. Outline. Background PinYin Support <say-as> Definition Domain Support Conclusion. Background. SSML Speech Synthesis Markup Language http://www.w3.org/TR/speech-synthesis/
E N D
An Introduction to S3ML Beijing InfoQuick SinoVoice Speech Technology Corp. CHEN Ming, LV Shinan, LI Xiulin
Outline • Background • PinYin Support • <say-as> Definition • Domain Support • Conclusion
Background • SSML • Speech Synthesis Markup Language • http://www.w3.org/TR/speech-synthesis/ • Now is W3C Recommendation • SinoVoice • Famous Speech Technology and Service Provider • Leading Chinese TTS Technology and Product • Deploy 1000+ Real Systems
Background • S3ML (SinoVoice SSML) • Since the launching of jTTS 4.0, March 2004 • Based on SSML Specification • Defines some extensions aiming at Chinese TTS • Defines the detail of some elements which SSML does not define precisely • Provide maximum compatibility with newest SSML version
PinYin Support • PinYin • Phoneme annotation method for Chinese characters • <phoneme> in SSML • The phoneme element provides a phonemic/phonetic pronunciation for the contained text. • Two attributes: alphabet and ph
PinYin Support • alphabet • The alphabet attribute is an optional attribute that specifies the phonemic/phonetic alphabet. • Use ‘py’ as value of ‘alphabet’ to specify that PinYin will be used • ph • The ph attribute is a required attribute that specifies the phoneme/phone string. • Use PinYin string as value of ‘ph’
PinYin Support • Example • More about PinYin string • Conformed to “Chinese Mandarin PinYin Specification“ • Series of PinYin for several characters • Tone information • 1~4: high flat, rising, diving and falling tone • 0, 5: light tone • <phoneme alphabet="py" ph="zha1">查</phoneme>良镛 • <phoneme alphabet="py" ph="zha1 liang2yong1">查良镛</phoneme>先生
PinYin support • When PinYin string is included in normal text? • Comparing with CSSML • We think <phoneme> is not for such purpose, <say-as> is more suitable • We think <phoneme> extension in S3ML is more compatible with SSML • Next station is <say-as interpret-as="phoneme" format="py"> • di4 tan2</say-as> • <phoneme lang=“zh-cn”>zha1</phoneme>良镛 • 他姓<phoneme py=“zha1”>查</phoneme>
<say-as> Definition • The detail of <say-as> element • When SinoVoice define S3ML, the detail values of the attributes of this element is not defined in SSML. • Now, “SSML 1.0 say-as attribute values” is proposed but it is still on progress • http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526/ • SinoVoice will support this proposal, so I will only talk about some additional values
<say-as> Definition • Name and address, especially person name because of the polyphone Chinese characters • Math, some mathematic expression is confused with other info • <say-as interpret-as=“name” format=“person”>张朝阳</say-as> • <say-as interpret-as=“address”>朝阳区</say-as> • <say-as interpret-as=“math” >2005-12-13</say-as> • <say-as interpret-as=“math”>+8610-62972997</say-as>
<say-as> Definition • Net address • Phoneme, useful for character/phoneme mixed text • <say-as interpret-as="net" format="email">abc@xyz.com</say-as> • <say-as interpret-as="net" format="url"> http://www.sinovoice.com.cn • </say-as> • The pronunciation of ‘tomato’ is • <say-as interpret-as="phoneme" format="ipa"> • tɒmɑtoʊ</say-as> • Next station is <say-as interpret-as="phoneme" format="py"> • di4 tan2</say-as>
Domain Support • Important for real system • Customized TTS is used more and more popular • Better voice quality than general version • One possibility in SSML • Use <voice> element and define special values of ‘name’ attribute • But it is not natural because it is normal to support several different domains by using a same name (voice library)
Domain Support • <domain> element • The ‘name’ attribute is required to specify the customized TTS package used • The value of ‘name’ attribute will be a vendor-specific name • <domain> will not change voice • If a voice library does not support this domain, this element will be just ignored.
Domain Support • If we want TTS System select the best voice for this domain automatically • Extended ‘domain’ attribute of <voice> • ‘domain’ is still in least priority • <domain name=“weather”> • 今天白天 ,晴转多云,最高温度26度 • </domain> • <voice domain=“weather”> • 今天白天 ,晴转多云,最高温度26度 • </voice>
Conclusion • Summarize extension of S3ML • <phoneme alphabet=“py” ph=“…”> • <say-as interpret-as=“...”> • name / address / math / phoneme / net • <domain name=“…”> • <voice domain=“…”> • We hope it will be helpful to define the standard for internationalizing SSML