160 likes | 340 Views
SSML 1.1 - The Internationalization of the W3C Speech Synthesis Markup Language. SpeechTek 2007 – C102 – Daniel C. Burnett. Overview. SSML 1.0 Why SSML 1.1? SSML 1.1 scope Selected features Examples voice/xml:lang pronunciation alphabets <w> element For more info. SSML 1.0.
E N D
SSML 1.1 - The Internationalization of the W3C Speech Synthesis Markup Language SpeechTek 2007 – C102 – Daniel C. Burnett
Overview • SSML 1.0 • Why SSML 1.1? • SSML 1.1 scope • Selected features • Examples • voice/xml:lang • pronunciation alphabets • <w> element • For more info . . .
SSML 1.0 • W3C Recommendation in 2004 • Widely implemented – the primary authoring API for TTS engines • Many extensions
Why SSML 1.1? • 1.0 extensions are primarily to address language-related phenomena • Workshops in China, Greece, and India to understand motivations for these extensions • How to correct tones for East Asian languages? • How to handle transliteration for Indian languages? • How to indicate word boundaries for written languages that do not display them? • How to precisely control voice and language changes?
SSML 1.1 scope • Provide broadened language support • For Mandarin, Cantonese, Hindi*, Arabic*, Russian*, Korean*, and Japanese, we will identify and address language phenomena that must be addressed to enable support for the language. Where possible we will address these phenomena in a way that is most broadly useful across many languages. We have chosen these languages because of their economic impact and expected group expertise and contribution. • We will also consider phenomena of other languages for which there is both sufficient economic impact and group expertise and contribution. • Fix incompatibilities with other Voice Browser Working Group languages, including PLS, SRGS, and VoiceXML 2.0/2.1. • Out of scope: • VCR-like controls: fast-forward, rewind, pause, resume • New <say-as> values. Collecting requirements for future <say-as> work is okay * provided there is sufficient group expertise and contribution for these languages
In scope Token/word boundaries Phonetic alphabets Tones Part of Speech support Text w/multiple languages (separate control of xml:lang and voice) Subword annotation (partial) Syllable-level markup (partial) Out of scope Providing number, case, gender info Simplified/alternate/SMS text Transliteration Expressive (emotion) elements Enhanced prosody rate control SSML 1.1 scope – some workshop topics
Selected new features • SSML 1.1 is a Working Draft – everything from this point on is subject to change • Improved lexicon activation control • Better linkage with PLS lexicons • Clearer separation between xml:lang (document text content) and voice selection • Improved author control of behavior upon xml:lang/voice selection mismatch • Introduction of a Pronunciation Alphabet Registry to allow use of standardized pinyin, jyutping, and other language-specific pronunciation alphabets in addition to the IPA default • New <w> element for marking word boundaries
Examples – voice/xml:lang • Next few examples demonstrate some of the new SSML 1.1 features that provide • Clearer separation between xml:lang (document text content) and voice selection • Improved author control of behavior upon xml:lang/voice selection mismatch
Simple example <speak … xml:lang=“en-US”> <voice languages=“en-US”> I want <voice name=“George” >a big</voice> <voice gender="female“ >pepperoni</voice> pizza. </voice> </speak> • Will find voices that can read US English, each time. • Voice changes are scoped, so the same voice is used for “I want” and “pizza.” • The “name” and “gender” values are requests only, and not required in order for voice selection to be successful.
“required” attribute <speak … xml:lang=“en-US”> <voice languages=“en-US”> I want <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • Now the name and gender attributes, respectively, are required rather than merely requested. • “required” attribute lists *all* required voice selection features, so the two inner voices might not be able to speak English • If one of the inner voices cannot read/speak English, processor can decide what to do (skip the text, try to read it anyway, or change voice)
“onlangfailure” attribute <speak … xml:lang=“en-US” onlangfailure=“ignoretext”> <voice languages=“en-US”> I want <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • Now, when any text is encountered that cannot be spoken by the currently selected voice, it will be skipped by the processor. The voice *will not* change. • Other options are “processorchoice”, “ignorelang”, and “changevoice”.
“onvoicefailure” attribute <speak … xml:lang=“en-US” onlangfailure=“ignoretext”> <voice languages=“en-US” onvoicefailure=“keepexisting”> I want <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • What if the processor can’t find a voice that meets the required criteria? In the above example, the processor will keep the voice it had. This attribute is scoped as well. • Other options are “priorityselect” and “processorchoice”.
Language and accent <speak … xml:lang=“en-US” onlangfailure=“ignoretext”> <voice languages=“zh-cmn:en-US en:en-US” onvoicefailure=“keepexisting”> <lang xml:lang=“zh-cmn”>我想要</lang> <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • First request is for a voice that can speak both English and Mandarin Chinese with a US-English accent • If voice selection is successful, the voice will be able to speak both the Chinese text and the final “pizza.” • Note that the female voice need not speak either language (as written).
Examples – pronunciation alphabets <speak version="1.1" ...> 此<phoneme alphabet=“pinyin“ ph=“chu4">处</phoneme>不准照相。 <!-- pinyin string is: "chù"--> </speak> • Developing a new Pronunciation Alphabet Registry • Experts can register pronunciation alphabets for their languages • Can also register historically used alphabets such as ARPAbet and Worldbet • First entries will likely be pinyin, jyutping
Examples – <w> element <speak version="1.1" ...> <!-- Ambiguous sentence is 南京市长江大桥 --> <!-- The Nanjing Changjiang River Bridge --> <w>南京市</w><w>长江大桥</w> <!-- The mayor of Nanjin city, Jiang Daqiao --> 南京市长<w>江大桥</w> </speak> • <w> element helps resolve ambiguities for languages that may not visually separate words. • Markup is allowed within <w> but does not cause word separation (unlike in the rest of SSML) => allows for sub-word <mark>, <prosody>, etc.
For more info . . . • Information about the Voice Browser Working Group can be found at http://www.w3.org/Voice/ • Current SSML drafts: • http://www.w3.org/TR/ssml11reqs/ • http://www.w3.org/TR/speech-synthesis11/