280 likes | 446 Views
OMA-REQ-2002-0039. “Multimodal and Multi-device Services”. OMA Requirements Group. Contacts: Stéphane H. Maes, IBM, smaes@us.ibm.com Jerome Vogedes, Motorola, Jerome.Vogedes@motorola.com Michele Zarri, T-Mobile, Michele.Zarri@t-mobile.co.uk Additional Supporters for Work Item:
E N D
OMA-REQ-2002-0039 “Multimodal and Multi-device Services” OMA Requirements Group Contacts: Stéphane H. Maes, IBM, smaes@us.ibm.com Jerome Vogedes, Motorola, Jerome.Vogedes@motorola.com Michele Zarri, T-Mobile, Michele.Zarri@t-mobile.co.uk Additional Supporters for Work Item: Alcatel, Openwave, Oracle, …
PRESENTATION SYNOPSIS • An overview of multimodal and multi-device value proposition, execution model, architecture and configurations as consistently identified by WAP activities (WAP-286) and 3GPP SES (TR 22.977). • Overview of related standard activities • OMA Activities • Mapping on OMA architecture framework • Analysis of WAE approach • Proposal for next steps: • Use cases and requirement specifications to support multimodal and multi-device services within OMA. • Dissemination of the requirements to existing or new working groups as needed (e.g. architecture, WAG …). • Proposal for next steps and answer to 3GPP SA-1 liaison: • OMA will address the service aspects of multimodal and multi-device interactions and rely on 3GPP/2 for network implications
Definitions • Channel: a particular user agent, device, or a particular modality - characterized by a delivery context • Mono-channel applications: applications designed for access though a single channel • Multi-channel applications: applications designed for access through different channels • Multimodal or multi-device applications: Several modalities or devices are available and synchronized sequentially or simultaneously: • Different granularities of synchronization. • There are numerous commonalities between interactions using several devices (multi-device interaction) and several modalities (multimodal interaction)
I need a direct flight from New York to San Francisco after 7:30pm today There are five direct flights from New York's LaGuardia airport to San Francisco after 7:30pm today: Delta flight 542... Book me on the United flight Multi-channel / Mono-Channel Usage Scenarios Voice • Multiple access devices • One interaction mode per device PC • Access from any telephone • Output is inherently sequential and may make mistakes FlightsHotelsCarsPackagesCruisesMaps EXPRESS SEARCH Departing from: Going to: [__________________] [__________________] When are you leaving? When are you returning? [Dec] [31] [Noon] [Jan] [ 1] [Noon] Tip: We have many more flight, hotel, and car options. WHAT'S NEW Ski Travel: Choose from more than 80 ski destinations Cruise Travel: Take a virtual tour of select cruise ships Wireless – WAP, … FROM LGA TO SFO NON-STOP FLIGHT DEP ARR DLnnn 7:30p 10:55p AAnnn 7:55p 10:55p UAnnn 8:30p 11:55p TOnnn 8:45p 0:10p MWnnn 9:15p 0:45p FROM LGA TO SFO NON-STOP FLIGHT DEP ARR DLnnn 7:30p 10:55p AAnnn 7:55p 10:55p UAnnn 8:30p 11:55p TOnnn 8:45p 0:10p MWnnn 9:15p 0:45p FROM LGA TO SFO NON-STOP FLIGHT DEP ARR DLnnn 7:30p 10:55p AAnnn 7:55p 10:55p UAnnn 8:30p 11:55p TOnnn 8:45p 0:10p MWnnn 9:15p 0:45p From: LGA__ To: ______ Date: ______ From: LGA__ To: SFO__ Date: ______ From: LGA__ To: SFO__ Date: 00/12/11 • Mobile and becoming ubiquitous • Hard to enter data • Standardized rich visual interface • Not suitable for mobile use
Express reservations. From which airport do you wish to depart? I need a direct flight from New York to San Francisco after 7:30pm today Would you like a window or an aisle seat? Book me on the United flight If you know your desired itinerary you can use a shortcut... Select seat 3A We have many more flight, hotel, and car options. Multimodal / Multi-device Interactions • User can select at any time the preferred modality of interaction or device • Key user interface for mobile deployments Example: mobile GUI + voice FROM LGA TO SFO NON-STOP FLIGHT DEP ARR DLnnn 7:30p 10:55p AAnnn 7:55p 10:55p UAnnn 8:30p 11:55p TOnnn 8:45p 0:10p MWnnn 9:15p 0:45p or [ ][ ] [**][**] [ ][ ] [**][**] [**][**] [**][**] [ ][ ] [**][**] [**][**] [**][**] [**][**] [**][**] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] [_][_][_] or • User is not tied to a particular channel's presentation flow • Interaction becomes a personal and optimized experience based on interaction and situation
Multimodal Computing Value Proposition Multimodal value proposition • easily enter and access data using small devices • by combining multiple input & output modes • choose the interaction mode that suits the task and situation • input: key, touch, stylus, voice... • output: display, tactile, audio... • alone or in public; driving, sitting, walking, … • reliable access: • User are no more blocked by limitations / mistakes of a given interaction mode at a given moment • use several devices in combination • by exploiting the resources of multiple devices • Dynamic, server-based on spontaneous PAN • In the future: conversational extension • Interacts with and across application as in an every day dialog: free form, mixed initiative and context management • Multimodal interaction is a key element of the user interface of the mobile space • Multimodal and multi-device services may significantly increase adoption of 3GPP services and mobile internet
HIERACHY OF NEEDS • Multimodal interaction and its underlying technology addresses the hierarchy of needs for mobile users.
1 a 2 6 3 a 4 5 MULTIMODAL AND MULTI-DEVICE EXECUTION MODEL User Interaction Modality or Dev A (User Agent) Modality or Dev B (User Agent) ... Optional Immediate update Update of the different modalities or device Exchange of interaction Application of Synchronization rules Rule Updates
Voice User Agent GUI User Agent Synchronization Manager UNDERLYING MVC ARCHITECTURE An extensible and Interoperable framework: MM Synchronization • Requirement: • MUST be Extensible and interoperable across programming model, capabilities (terminal, network, application) and configurations: • MUST support the relevant programming models (see standard activities) • MUST support the multimodal and multi-device execution model • MUST support all target configurations (see after)
VoiceXML GUI User Agent Web Server XHTML Interoperable Deployment Configurations Sequential: (no voice and data support) Speech Browser Voice Audio Sub-system Network Synchronization Manager http Presentation Server Client
VoiceXML GUI User Agent Web Server Synchronization Manager XHTML Interoperable Deployment Configurations Thin Client Configuration (voice and data support) - Server-side speech engines local to speech browser Speech Engines Speech Recognition Framework Codec Speech Browser MM Synchronization Possible Event coordination, ... Network MM Synchronization Presentation Server Client
Voice XML GUI User Agent Web Server Synchronization Manager XHTML Interoperable Deployment Configurations Thin Client Configuration (voice and data support) - Server-side speech engines remote from speech browser Speech Recognition Framework SRCP Codec Speech Engines Speech Browser Possible Event coordination, ... MM Synchronization Network MM Synchronization Presentation Server Client
VoiceXML GUI User Agent Synchronization Manager Web Server XHTML Interoperable Deployment Configurations Fat client configuration with local speech engines Speech Engines Codec Speech Browser MM Synchronization Network MM Synchronization Presentation Client This can be the internal architecture of a browser implementation
VoiceXML GUI User Agent Synchronization Manager Web Server XHTML Interoperable Deployment Configurations Fat client configuration with remote speech engines Speech Recognition Framework Codec Speech Engines Speech Browser SRCP MM Synchronization Network MM Synchronization Presentation Client
XHTML Browser XHTML Browser WAP Browser Synchronization Manager Interoperable Deployment Configurations Multi-device WAP Phone Wireless PDA Kiosk • Can be on server or on one of the devices • Same execution model and flow • The latter case would preferably encompass replication of state of synchronization manager.
Multimodal Interaction Standard Activities • W3C - MMI WG: • Status: • MMI requirements (including use cases and framework) • Other activities: • Multimodal synchronization (Distributed XML events) • Ink / Handwriting • Update of NLSML Working Draft 1.0 into EMMA • Input: • X+V - W3C Note (http://www.w3.org/TR/xhtml+voice/) • Positions presented at the W3C Face-To-Face meeting: MVC; XForms / DI authoring for multimodal (http://www.w3c.org) • SALT (Submitted to Voice Activity and MMI WG) • Direction: • Re-use of existing W3C specifications • XHTML, SMIL, XForms, XML events, VoiceXML • Related working specifications: • VoiceXML 2.0 is in last call since April 24, 2002 • XML events is in last call since November 2001 • XForms 1.0 is in last call since January 2002 • DI WG addresses delivery context and DI authoring
Multimodal Interaction Standard Activities (more) • 3GPP: • SES – Stage-1 TS 22.243 (SRF) and TR 22.977 (SES) • Stage-2 activities proposed to be started at SA-2 and SA-4 • May propose soon SA-3 and T-2 activities • OMA/WAP Forum: • MM synchronization interfaces to WAP client (user agent - DOM-MP) • ETSI STQ (Aurora) • DSR Codecs • DSR Protocol framework • Multi-modal architecture (MVC) and DSR • Speech Engine Remote Control • IETF: • DSR • SPEECHSC (Speech Engine Remote Control) • ITU: • DSR / DSV
WAP User Agent ECMAScript-MP MM Lib Presentation Update WAE Work on Multimodal • OMA-WAP-286xxx– work in progress • Defined an ECMAScript-MP library on the user agent that: • Sends interaction events to a target handler (i.e. multimodal synchronization manager) • Transported via HTTP/WAP when distributed (later, could be on SOAP) • Interface to update the presentation via presentation page / snippet upload and update • Transported via WAP push when distributed • In the future, can be transported on SOAP an treated as web service (to and from the client). Interaction event HTTP / WAP Page / Snippet / DOM mutation WAP Push (later on SOAP) MM Synchronization
GUI User Agent Web Server Thin Client Configuration Mapped on OMA framework (Speech engines can be local or remote from speech browser) Applications & Content Synchronization Manager MM Synchronization Applications & Content Speech Browser Implements the multimodal / multi-device execution model Possible Event coordination, ... Speech Engine Interface or remote control Terminal Application Platform(s) Server Application Platform(s) Network Services Codec and audio sub-systems Speech Engines Speech Recognition Framework (Streamed audio, codec negotiation, signaling, meta-information) Network Infrastructure
GUI User Agent Web Server Accumulated Functions & OMA Framework (Speech engines can be local or remote from speech browser) Applications & Content Synchronization Manager Speech Browser Applications & Content Features may be client-based, service based or both based on configuration to support Speech Browser Synchronization Manager Terminal Application Platform(s) Server Application Platform(s) SRCP SRCP Network Services Security, Privacy Delivery context management and adaptation Delivery context management and adaptation Identity / delivery Context Security, Privacy Base OMA features Codec and audio sub-systems Speech Engines Session management SRF Session management MM Sync (interface and exchanges) Security, Privacy Session state replication Push Session state replication HTTP Network Infrastructure
Examples of Main Architecture Requirements • Should be Interoperable and extensible: • Across runtime configurations • Across content • Across different runtime / infrastructure capabilities (client, server, infrastructure) • Should Support: • Thin and fat configurations with local or remote speech engines • Suspend and resume • Dynamic multi-device or multimodal sessions: • more than 2 views • changing views and configurations: • e.g. hybrid thin/fat client configuration. • Should Relies on: • Existing standard interfaces and protocols practices (to be profiled for some channels) • Evolution of web (web services, intermediaries, etc...) • Fit evolution of wireless infrastructure • Should be directly compatible with evolution of W3C authoring specifications and reusing existing authoring standards • XHTML + Voice, SALT, XForms
SERVICE AND ARCHITECTURE SCOPE AND REQUIREMENTS • Integration in OMA architecture framework • Support of Multimodal Synchronization interface and protocols: • QoS, • security, • privacy, • administration, • terminal / user agent interfaces (WAE) • Discovery, registration, negotiation, disconnect mechanisms • Support of Speech Recognition Framework (SRF) – protocols and codecs (for configuration with server-side speech recognition) (e.g. 3GPP SRF) • Support of Speech Engine Remote Control (SRCP) (e.g. IETF SPEECHSC)(for configurations with remote speech engines) • Synchronization State replication (for dynamic multi-device configurations without server and connect/reconnect). • Relative time synchronization (TBD) • Possible support of CS for voice channel • Authoring (mobile profile)
Today OMA Mobile Profile? XFORMS DI Authoring XHTML Basic, ... HTML 4 XHTML Markup Language XFORMS VoiceXML X+V PSTN (Data) 3G (Voice + Data) 802.11 GPRS(2.5G) 2G Wireless PSTN & 2G (Voice) Networks Fat Speech Capable Smart Thin Data Thin Mobile Fat Mobile Smart Mobile Client Conversational Multimodal Everything Multimodal Systems Other Modalities Thin /Fat / Hybrid / Multi-device Config Sequential Fat Config Evolution of Multimodal Deployments
PROPOSED NEXT STEPS AND ANSWER TO 3GPP SA-1 LIAISON • Proposal for next steps: • Use cases and requirement specifications to support multimodal and multi-device services within OMA. • Dissemination of the requirements to existing or new working groups as needed (e.g. architecture, WAG …). • Review and extension of WAE work by OMA architecture & Wireless Web services • A multimodal mobile profile for authoring? • An activity devoted to multimodal and multi-device services? • Proposal for next steps and answer to 3GPP SA-1 liaison: • OMA will address the service aspects of multimodal and multi-device interactions… • OMA will liaise with 3GPP / 2 and rely on them for the network aspects
OVERVIEW • Current OMA activities: • Several input papers have been presented to WAP Forum in the past (IBM, W3C/OpenWave, …) • WAP WAE has started to investigate multimodal support and request OMA TP opinion on technical direction taken and broader architecture • OMA Architecture Framework includes a multimodal / multi-device use case as part of the OMA architecture framework definition. • 3GPP SA-1 has developed a feasibility study on Speech Enabled Services (SES), It identified the need for a technical specification to support multimodal and multi-device services that has strong service/application aspects. • Recommendation: Work Item Proposal • We would like to encourage OMA to initiate an activity to address the service aspects of multimodal and multi-device services. • Requirements & specifications on Architecture and OMA enablement + possibly a mobile profile (authoring) • Relationship with other standard activities: • 3GPP should focus on network aspects • W3C is focussed on the multimodal programming model in general and enabling technologies (XML events,…)
SES at 3GPP TR22.977 Speech Enabled Services Multimodal Services Speech-only Services Multi-device Services Client-side only Speech engines Server-side Speech engines TS22.243 Speech Recognition Framework (Packet Switched) Circuit Switched Speech Recognition DSR optimized Codec Conventional Codec Other
Multimodal and multi-device services within the 3GPP System 3G SRF Protocol Stack 3G SRF Protocol Stack 3G Network • Additional protocols • Discovery, registration, negotiation, disconnect • synchronization state replication • relative clock synchronization • Encryption / security Client Application Server Application MM Synchronization MM Synchronization MMSP Speech Engine Remote Control SRCP SRCP Speech Meta- Information Speech Meta- Information SRF Session Control Codec Management Codec Management MIT SDP SIP RTCP Uplink Decoder Uplink encoder Speech Input RTP RT-DSR Payload Speech Engines Downlink encoder Speech Output Downlink Decoder UDP / TCP • Based on TR 22.977 • Work in progress! • Other options may exist! IP • 3GPP direction may ultimately differ! 3GPP L2