1 / 10

SSML Extensions for Contextual Interpretation in Multi-Language Usage

Explore innovative SSML extensions proposed for enhancing text-to-speech technology across various language contexts. Enhance interpretation by introducing new values for "interpret-as" attributes and a new element, "<token>", to specify language information at a granular level.

cluck
Download Presentation

SSML Extensions for Contextual Interpretation in Multi-Language Usage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Davide Bonardo SSML extensions for multi-language usage W3C Workshop on Internationalizing SSML Crete, 30-31 May 2006

  2. About Loquendo • R&D of speech technology • Over 30 years experience (from CSELT laboratories) • Technologies: • TTS (text to speech) • ASR (automatic speech recognition) & SV (Speaker Verification) • Solutions: • Easy integration of speech technologies • Speech servers (MRCPv1 & v2 protocols) • Speech platforms (VoiceXML & CCXML interpreters) • Embedded solutions (for many OS and devices)

  3. Ideas for SSML extensions • <say-as> element • Extension of the values for the “interpret-as” attribute • New element • <token>

  4. Proposal 1: <say-as> extension (1/3) • Problem: • How to interpret a part of an input text • Different contexts of dialog require different interpretations • The interpretation could be language dependent • Many contexts could be defined: sms, e-mails, news, application for rescue operations, … • The TTS engines may use context information to activate the best configuration for: • reading acronyms • abbreviation expansions • using customized prosodic phrasing • activating a special reading style

  5. Proposal 1: <say-as> extension (2/3) Proposal: • To extend the “interpret-as” attribute with new values, for instance: • sms • e-mail • news • banking • navigation • …

  6. Proposal 1: <say-as> extension (3/3) Examples <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> I call you asap. <say-as interpret-as="sms"> I call you asap </say-as> </speak> <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-GB"> <say-as interpret-as="sms"> Mtfbwu </say-as> </speak>

  7. Proposal 2: New element <token> (1/3) • Problem 1: the activation of the correct language knowledge at the specific point of the text • “xml:lang” attribute is currently available in <speak>, <voice>, <p> and <s> elements • The behavior for the engine could be different: • In the root <speak> element, “xml:lang” defines the language of the whole document, but for the engine it involves the selection of a voice • In the <voice> element, it is an important recommendation in order to load the correct voice • In the <p> and <s> elements, it is mainly a language information and the engine, if able to do this, can use the same voice but a different language knowledge (e.g. phonetic mapping) • Problem 2: it could be necessary to specify a language change for a text unit smaller than a sentence.

  8. Proposal 2: New element <token> (2/3) Proposal: • To introduce a new element <token> • To extend the use of “xml:lang” attribute to the <token> element Advantages: • It is a generic element • It is extensible • Without attributes, it could be used to give information on the segmentation, where needed. • With other attributes, it could specify new information for the token (i.e. part of speech)

  9. Proposal 2: New element <token> (3/3) Examples <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> The movie is the product of Italian comic sensation Roberto Benigni, who wore three hats for "La vita è bella": director, co-writer, and star. </speak> <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> The movie is the product of Italian comic sensation <token xml:lang="it-IT">Roberto Benigni</token>, who wore three hats for <token xml:lang="it-IT"> "La vita è bella"</token>: director, co-writer, and star. </speak>

  10. Conclusions • Proposal 1: • To increase the number of “interpret-as” values with the identification of new context of speech • Proposal 2: • To introduce a new element to define some specific information (i.e. the language) for a single word, or phrase and so on.

More Related