1 / 16

SSML 1.1 - The Internationalization of the W3C Speech Synthesis Markup Language

SSML 1.1 - The Internationalization of the W3C Speech Synthesis Markup Language. SpeechTek 2007 – C102 – Daniel C. Burnett. Overview. SSML 1.0 Why SSML 1.1? SSML 1.1 scope Selected features Examples voice/xml:lang pronunciation alphabets <w> element For more info. SSML 1.0.

sheera
Download Presentation

SSML 1.1 - The Internationalization of the W3C Speech Synthesis Markup Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SSML 1.1 - The Internationalization of the W3C Speech Synthesis Markup Language SpeechTek 2007 – C102 – Daniel C. Burnett

  2. Overview • SSML 1.0 • Why SSML 1.1? • SSML 1.1 scope • Selected features • Examples • voice/xml:lang • pronunciation alphabets • <w> element • For more info . . .

  3. SSML 1.0 • W3C Recommendation in 2004 • Widely implemented – the primary authoring API for TTS engines • Many extensions

  4. Why SSML 1.1? • 1.0 extensions are primarily to address language-related phenomena • Workshops in China, Greece, and India to understand motivations for these extensions • How to correct tones for East Asian languages? • How to handle transliteration for Indian languages? • How to indicate word boundaries for written languages that do not display them? • How to precisely control voice and language changes?

  5. SSML 1.1 scope • Provide broadened language support • For Mandarin, Cantonese, Hindi*, Arabic*, Russian*, Korean*, and Japanese, we will identify and address language phenomena that must be addressed to enable support for the language. Where possible we will address these phenomena in a way that is most broadly useful across many languages. We have chosen these languages because of their economic impact and expected group expertise and contribution. • We will also consider phenomena of other languages for which there is both sufficient economic impact and group expertise and contribution. • Fix incompatibilities with other Voice Browser Working Group languages, including PLS, SRGS, and VoiceXML 2.0/2.1. • Out of scope: • VCR-like controls: fast-forward, rewind, pause, resume • New <say-as> values. Collecting requirements for future <say-as> work is okay * provided there is sufficient group expertise and contribution for these languages

  6. In scope Token/word boundaries Phonetic alphabets Tones Part of Speech support Text w/multiple languages (separate control of xml:lang and voice) Subword annotation (partial) Syllable-level markup (partial) Out of scope Providing number, case, gender info Simplified/alternate/SMS text Transliteration Expressive (emotion) elements Enhanced prosody rate control SSML 1.1 scope – some workshop topics

  7. Selected new features • SSML 1.1 is a Working Draft – everything from this point on is subject to change • Improved lexicon activation control • Better linkage with PLS lexicons • Clearer separation between xml:lang (document text content) and voice selection • Improved author control of behavior upon xml:lang/voice selection mismatch • Introduction of a Pronunciation Alphabet Registry to allow use of standardized pinyin, jyutping, and other language-specific pronunciation alphabets in addition to the IPA default • New <w> element for marking word boundaries

  8. Examples – voice/xml:lang • Next few examples demonstrate some of the new SSML 1.1 features that provide • Clearer separation between xml:lang (document text content) and voice selection • Improved author control of behavior upon xml:lang/voice selection mismatch

  9. Simple example <speak … xml:lang=“en-US”> <voice languages=“en-US”> I want <voice name=“George” >a big</voice> <voice gender="female“ >pepperoni</voice> pizza. </voice> </speak> • Will find voices that can read US English, each time. • Voice changes are scoped, so the same voice is used for “I want” and “pizza.” • The “name” and “gender” values are requests only, and not required in order for voice selection to be successful.

  10. “required” attribute <speak … xml:lang=“en-US”> <voice languages=“en-US”> I want <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • Now the name and gender attributes, respectively, are required rather than merely requested. • “required” attribute lists *all* required voice selection features, so the two inner voices might not be able to speak English • If one of the inner voices cannot read/speak English, processor can decide what to do (skip the text, try to read it anyway, or change voice)

  11. “onlangfailure” attribute <speak … xml:lang=“en-US” onlangfailure=“ignoretext”> <voice languages=“en-US”> I want <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • Now, when any text is encountered that cannot be spoken by the currently selected voice, it will be skipped by the processor. The voice *will not* change. • Other options are “processorchoice”, “ignorelang”, and “changevoice”.

  12. “onvoicefailure” attribute <speak … xml:lang=“en-US” onlangfailure=“ignoretext”> <voice languages=“en-US” onvoicefailure=“keepexisting”> I want <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • What if the processor can’t find a voice that meets the required criteria? In the above example, the processor will keep the voice it had. This attribute is scoped as well. • Other options are “priorityselect” and “processorchoice”.

  13. Language and accent <speak … xml:lang=“en-US” onlangfailure=“ignoretext”> <voice languages=“zh-cmn:en-US en:en-US” onvoicefailure=“keepexisting”> <lang xml:lang=“zh-cmn”>我想要</lang> <voice name=“George” required=“name”>a big</voice> <voice gender="female“ required=“gender”>pepperoni</voice> pizza. </voice> </speak> • First request is for a voice that can speak both English and Mandarin Chinese with a US-English accent • If voice selection is successful, the voice will be able to speak both the Chinese text and the final “pizza.” • Note that the female voice need not speak either language (as written).

  14. Examples – pronunciation alphabets <speak version="1.1" ...> 此<phoneme alphabet=“pinyin“ ph=“chu4">处</phoneme>不准照相。 <!-- pinyin string is: "chù"--> </speak> • Developing a new Pronunciation Alphabet Registry • Experts can register pronunciation alphabets for their languages • Can also register historically used alphabets such as ARPAbet and Worldbet • First entries will likely be pinyin, jyutping

  15. Examples – <w> element <speak version="1.1" ...> <!-- Ambiguous sentence is 南京市长江大桥 --> <!-- The Nanjing Changjiang River Bridge --> <w>南京市</w><w>长江大桥</w> <!-- The mayor of Nanjin city, Jiang Daqiao --> 南京市长<w>江大桥</w> </speak> • <w> element helps resolve ambiguities for languages that may not visually separate words. • Markup is allowed within <w> but does not cause word separation (unlike in the rest of SSML) => allows for sub-word <mark>, <prosody>, etc.

  16. For more info . . . • Information about the Voice Browser Working Group can be found at http://www.w3.org/Voice/ • Current SSML drafts: • http://www.w3.org/TR/ssml11reqs/ • http://www.w3.org/TR/speech-synthesis11/

More Related