220 likes | 383 Views
A Standard for Developing Multimodal Applications. James A. Larson Larson Technical Services jim @ larson-tech.com SpeechTEK West February 23, 2007 . Status of W3C Multimodal Interface Languages. Recommendation. Voice XML 2.0. Speech Recog- nition Grammar Format (SRGS) 1.0. Speech
E N D
A Standard for Developing Multimodal Applications James A. LarsonLarson Technical Servicesjim @ larson-tech.com SpeechTEK WestFebruary 23, 2007
Status of W3C Multimodal Interface Languages Recommendation Voice XML 2.0 Speech Recog- nition Grammar Format (SRGS) 1.0 Speech Synthesis Markup Language (SSML) 1.0 Semantic Interpret- ation of Speech Recog- nition (SISR) 1.0 Proposed Recommendation Voice XML 2.1 Candidate Recommendation Last Call Working Draft Extended Multi- modal Interaction (EMMA) 1.0 Working Draft State Chart XML (SCXML) 1.0 InkXL 1.0 Requirements Developing & Delivering Multimodal Applications
SALT Object- oriented Interaction Manager (XHTML) Interaction Manager (C#) SALT SAPI 5.3 Interaction Manager Approaches X+V W3C Interaction Manager (XHTML) Interaction Manager (SCXML) Data Model VoiceXML 2.0 Modules XHTML VoiceXML 3.0 InkML Developing & Delivering Multimodal Applications
Object- SALT oriented X+V W3C Standard XHTML SRGS VoiceXML SCXML Languages SRGSSSML SRGS SRGS SSML SSML VoiceXML SISR SSML XHTML SISR XHTML EMMA CCXML Interaction XHTMLC# XHTML SCXML Manager Modes GUI GUI GUI GUI Speech Speech Speech Speech Ink … Developing & Delivering Multimodal Applications
MMI Architecture—Basic Components • Interaction Manager—coordinates modality components and provides application flow • Modality Components—provide modality capabilities such as speech, pen, keyboard, mouse • Data Model—handles shared data Interaction Manager (SCXML) Data Model XHTML VoiceXML 3.0 InkML Developing & Delivering Multimodal Applications
Multimodal Architecture and Interfaces • A loosely-coupled, event-based architecture for integrating multiple modalities into applications • All communication is event-based • Based on a set of standard life-cycle events • Components can also expose other events as required • Encapsulation protects component data • Encapsulation enhances extensibility to new modalities • Can be used outside a Web environment Interaction Manager (SCXML) Data Model XHTML VoiceXML 3.0 InkML Developing & Delivering Multimodal Applications
Specify Interaction Manager Using Harel State Charts Prepare State • Extension of state transition systems • States • Transitions • Nested state-transition systems • Parallel state-transition systems • History Prepare Response (fail) Prepare Response (success) Start State StartFail Start Response FailState WaitState DoneFail Done Success EndState Developing & Delivering Multimodal Applications
State Chart XML (SCXML) … <state id="PrepareState"> <send event="prepare" contentURL="hello.vxml"/> <transition event="prepareResponse" cond="status='success'" target="StartState"/> <transition event="prepareResponse" cond="status='failure'" target="FailState"/> </state> … Example State Transition System Prepare State Prepare Response (fail) Prepare Response (success) Start State StartFail Start Response FailState WaitState DoneFail Done Success EndState Developing & Delivering Multimodal Applications
Example State Chart with Parallel States Prepare Voice Prepare GUI Prepare Response Fail Prepare Response Fail Prepare Response Success Prepare Response Success Start Voice Start GUI Start Fail Start Fail Start Response Fail Voice Start Response Fail GUI Done Fail Done Fail Wait Voice Wait GUI Done Success Done Success End Voice End GUI Developing & Delivering Multimodal Applications
The Life Cycle Events prepare prepare SCXML prepareResponse prepareResponse XHTML VoiceXML start start SCXML startResponse startResponse XHTML VoiceXML cancel cancel SCXML cancelResponse cancelResponse XHTML VoiceXML pause pause SCXML pauseResponse pauseResponse XHTML VoiceXML resume resume SCXML resumeResponse resumeResponse XHTML VoiceXML Developing & Delivering Multimodal Applications
More Life Cycle Events newContextRequest SCXML newContextRequest newContextResponse newContextResponse XHTML VoiceXML SCXML data data XHTML VoiceXML SCXML done XHTML clearContext clearContext SCXML XHTML VoiceXML Developing & Delivering Multimodal Applications
Intent-based events Capture the underlying intent rather than the physical manifestation of user-SCXML events Independent of the physical characteristics of particular devices Data/reset Reset one or more field values to null Data/focus Focus on another field Data/change Field value has changed Synchronization Using the Lifecycle Data Event SCXML data data XHTML VoiceXML Developing & Delivering Multimodal Applications
Modality Lifecycle Events between Interaction Manager and Modality Interaction Manager prepare Prepare State Prepare Response Fail prepare response (failure) Prepare Response Success) prepare response (success) start Start State start response (success) Start Fail Start Response FailState start response (failure) DoneFail WaitState data Done Success done EndState Developing & Delivering Multimodal Applications
MMI Architecture Principles • Interaction manager communicates with Modality Components through asynchronous events • Modality Components don’t communicate directly with each other, but indirectly through the Interaction manager • Components must implement basic life cycle events, may expose other events • Modality components can be nested (e.g. a Voice Dialog component like a VoiceXML <form>) • Components need not be markup-based • EMMA communicates users’ inputs to the Interaction Manager Developing & Delivering Multimodal Applications
GUI Modality (XHTML) Adapter converts Lifecycle events to XHTML events XHTML events converted to lifecycle events Modalities Interaction Manager (SCXML) Data Model XHTML VoiceXML 3.0 • Voice Modality (VoiceXML 3.0) • Lifecyle events are embeddedinto VoiceXML 3.0 Developing & Delivering Multimodal Applications
VoiceXML supports Events sent from the Interaction Manager Sending events to the Interaction Manager. <form> <catch name="change"> <assign name="city" value="data"/> </catch> … <field name = "city"> <prompt> Blah </prompt> <grammar src="city.grxml"/> <filled><send event="data.change" data="city"/> </filled> </field> </form> Modalities Interaction Manager (SCXML) Data Model XHTML VoiceXML 3.0 Developing & Delivering Multimodal Applications
XHTML is extended to send events to the Interaction Manager. <head>…<ev:Listener ev:event="onChange" ev:observer="app1" ev:handler="onChangeHandler()";>…<script>{function onChangeHandler()post ("data", data="city")}</script></head> … <body id="app1"? <input type="text" id=city "value= " "/></body> … Modalities Interaction Manager (SCXML) Data Model XHTML VoiceXML 3.0 Developing & Delivering Multimodal Applications
XHTML is extended to support events received from the Interaction Manager <head>…<handler type="text/javascript“ ev:event="data" if (event="change" {document.app1.city.value="data.city"}</handler>…</head> … <body id="app1"? <input type="text" id="city" value=""/> </body>… Modalities Interaction Manager (SCXML) Data Model XHTML VoiceXML 3.0 Developing & Delivering Multimodal Applications
References • SCXML • Second working draft available at http://www.w3.org/TR/2006/WD-scxml-20060124/ • Open Source available from http://jakarta.apache.org/commons/sandbox/scxml/ • Multimodal Architecture and Interfaces • Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch-20060414/ • Voice Modality • First working draft VoiceXML 3.0 scheduled for November 2007 • XHTML • Full recommendation • Adapters must be hand-coded • Other modalities • TBD Developing & Delivering Multimodal Applications
Availability • SAPI 5.3 • Microsoft Windows Vista® X+V • ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003 http://www-306.ibm.com/software/pervasive/multimodal/?Open&ca=daw-prod-mmb • Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb • Opera 9 for Windows http://www.opera.com/ W3C • First working draft of VoiceXML 3.0 not yet available • Working drafts of SCXML are available; some open-source implementations are available Proprietary APIs • Available from vendor Developing & Delivering Multimodal Applications
Final Advice • The W3C is defining a rich collection of languages for authoring multimodal applications • SCXML can be used as an Interaction Manager • Many languages for modalities: VoiceXML, XHTML, … • EMMA may be used to describe data transmitted among modules • W3C languages will be available on multiple platforms • Avoid getting locked into using proprietary languages available only on a single platform • The W3C languages will be available on multiple platforms Developing & Delivering Multimodal Applications
Web Resources • http://www.w3.org/voice • Specification of grammar, semantic interpretation, and speech synthesis languages • http://www.w3.org/2002/mmi • Specification of EMMA and InkML languages • http:/www.microsoft.com (and query SALT) • SALT specification and download instructions for adding SALT to Internet Explorer • http://www-306.ibm.com/software/pervasive/multimodal/ • X+V specification; download Opera and ACCESS browsers • http://www.larson-tech.com/SALT/ReadMeFirst.html • Student projects using SALT to develop multimodal applications • http://www.larson-tech.com/MMGuide.html or http://www.w3.org/2002/mmi/Group/2006/Guidelines/ • User interface guidelines for multimodal applications Developing & Delivering Multimodal Applications