100 likes | 123 Views
Explore innovative SSML extensions proposed for enhancing text-to-speech technology across various language contexts. Enhance interpretation by introducing new values for "interpret-as" attributes and a new element, "<token>", to specify language information at a granular level.
E N D
Davide Bonardo SSML extensions for multi-language usage W3C Workshop on Internationalizing SSML Crete, 30-31 May 2006
About Loquendo • R&D of speech technology • Over 30 years experience (from CSELT laboratories) • Technologies: • TTS (text to speech) • ASR (automatic speech recognition) & SV (Speaker Verification) • Solutions: • Easy integration of speech technologies • Speech servers (MRCPv1 & v2 protocols) • Speech platforms (VoiceXML & CCXML interpreters) • Embedded solutions (for many OS and devices)
Ideas for SSML extensions • <say-as> element • Extension of the values for the “interpret-as” attribute • New element • <token>
Proposal 1: <say-as> extension (1/3) • Problem: • How to interpret a part of an input text • Different contexts of dialog require different interpretations • The interpretation could be language dependent • Many contexts could be defined: sms, e-mails, news, application for rescue operations, … • The TTS engines may use context information to activate the best configuration for: • reading acronyms • abbreviation expansions • using customized prosodic phrasing • activating a special reading style
Proposal 1: <say-as> extension (2/3) Proposal: • To extend the “interpret-as” attribute with new values, for instance: • sms • e-mail • news • banking • navigation • …
Proposal 1: <say-as> extension (3/3) Examples <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> I call you asap. <say-as interpret-as="sms"> I call you asap </say-as> </speak> <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-GB"> <say-as interpret-as="sms"> Mtfbwu </say-as> </speak>
Proposal 2: New element <token> (1/3) • Problem 1: the activation of the correct language knowledge at the specific point of the text • “xml:lang” attribute is currently available in <speak>, <voice>, <p> and <s> elements • The behavior for the engine could be different: • In the root <speak> element, “xml:lang” defines the language of the whole document, but for the engine it involves the selection of a voice • In the <voice> element, it is an important recommendation in order to load the correct voice • In the <p> and <s> elements, it is mainly a language information and the engine, if able to do this, can use the same voice but a different language knowledge (e.g. phonetic mapping) • Problem 2: it could be necessary to specify a language change for a text unit smaller than a sentence.
Proposal 2: New element <token> (2/3) Proposal: • To introduce a new element <token> • To extend the use of “xml:lang” attribute to the <token> element Advantages: • It is a generic element • It is extensible • Without attributes, it could be used to give information on the segmentation, where needed. • With other attributes, it could specify new information for the token (i.e. part of speech)
Proposal 2: New element <token> (3/3) Examples <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> The movie is the product of Italian comic sensation Roberto Benigni, who wore three hats for "La vita è bella": director, co-writer, and star. </speak> <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> The movie is the product of Italian comic sensation <token xml:lang="it-IT">Roberto Benigni</token>, who wore three hats for <token xml:lang="it-IT"> "La vita è bella"</token>: director, co-writer, and star. </speak>
Conclusions • Proposal 1: • To increase the number of “interpret-as” values with the identification of new context of speech • Proposal 2: • To introduce a new element to define some specific information (i.e. the language) for a single word, or phrase and so on.