SSML Extensions for Contextual Interpretation in Multi-Language Usage

Davide Bonardo SSML extensions for multi-language usage W3C Workshop on Internationalizing SSML Crete, 30-31 May 2006

About Loquendo • R&D of speech technology • Over 30 years experience (from CSELT laboratories) • Technologies: • TTS (text to speech) • ASR (automatic speech recognition) & SV (Speaker Verification) • Solutions: • Easy integration of speech technologies • Speech servers (MRCPv1 & v2 protocols) • Speech platforms (VoiceXML & CCXML interpreters) • Embedded solutions (for many OS and devices)

Ideas for SSML extensions • <say-as> element • Extension of the values for the “interpret-as” attribute • New element • <token>

Proposal 1: <say-as> extension (1/3) • Problem: • How to interpret a part of an input text • Different contexts of dialog require different interpretations • The interpretation could be language dependent • Many contexts could be defined: sms, e-mails, news, application for rescue operations, … • The TTS engines may use context information to activate the best configuration for: • reading acronyms • abbreviation expansions • using customized prosodic phrasing • activating a special reading style

Proposal 1: <say-as> extension (2/3) Proposal: • To extend the “interpret-as” attribute with new values, for instance: • sms • e-mail • news • banking • navigation • …

Proposal 1: <say-as> extension (3/3) Examples <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> I call you asap. <say-as interpret-as="sms"> I call you asap </say-as> </speak> <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-GB"> <say-as interpret-as="sms"> Mtfbwu </say-as> </speak>

Proposal 2: New element <token> (1/3) • Problem 1: the activation of the correct language knowledge at the specific point of the text • “xml:lang” attribute is currently available in <speak>, <voice>, <p> and <s> elements • The behavior for the engine could be different: • In the root <speak> element, “xml:lang” defines the language of the whole document, but for the engine it involves the selection of a voice • In the <voice> element, it is an important recommendation in order to load the correct voice • In the <p> and <s> elements, it is mainly a language information and the engine, if able to do this, can use the same voice but a different language knowledge (e.g. phonetic mapping) • Problem 2: it could be necessary to specify a language change for a text unit smaller than a sentence.

Proposal 2: New element <token> (2/3) Proposal: • To introduce a new element <token> • To extend the use of “xml:lang” attribute to the <token> element Advantages: • It is a generic element • It is extensible • Without attributes, it could be used to give information on the segmentation, where needed. • With other attributes, it could specify new information for the token (i.e. part of speech)

Proposal 2: New element <token> (3/3) Examples <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> The movie is the product of Italian comic sensation Roberto Benigni, who wore three hats for "La vita è bella": director, co-writer, and star. </speak> <?xml version="1.0" encoding="ISO-8859-1"?> <speak version="1.0" xml:lang="en-US"> The movie is the product of Italian comic sensation <token xml:lang="it-IT">Roberto Benigni</token>, who wore three hats for <token xml:lang="it-IT"> "La vita è bella"</token>: director, co-writer, and star. </speak>

Conclusions • Proposal 1: • To increase the number of “interpret-as” values with the identification of new context of speech • Proposal 2: • To introduce a new element to define some specific information (i.e. the language) for a single word, or phrase and so on.

SSML Extensions for Contextual Interpretation in Multi-Language Usage

SSML Extensions for Contextual Interpretation in Multi-Language Usage

Presentation Transcript

ISAT Language Usage Vocabulary

REGISTER in language usage with implications for teaching.

SSML extensions for multi-language usage

Language extensions for speculative parallelism

Multi-language sites

LDP Extensions for Multi Topology draft-ietf-mpls-ldp-multi-topology-11

SSML Extensions for Chinese Voice Browsing

Programming Language Usage

Competent (English) Language Usage Essentials

Instruction Set Extensions for Multi-Threading in LEON3

W3C Workshop on Internationalizing SSML SSML Extension for Korean

Extensions to Multi Query Optimization

PLS for SSML

Benefits and usage of natural mink eyelash extensions

MMX Multi Media eXtensions

W3C Workshop on Internationalizing SSML SSML Extension for Korean

SSML Extension for Expressive Mandarin TTS

REGISTER in language usage with implications for teaching.

Extensions to Multi Query Optimization

Multi Language Translater