Designing Robust Multimodal Systems for Diverse Users and Mobile Environments

Designing Robust Multimodal Systems for Diverse Users and Mobile Environments Sharon Oviattoviatt@cse.ogi.edu; http://www.cse.ogi.edu/CHCC/

Introduction to Perceptive Multimodal Interfaces • Multimodal interfaces recognize combined natural human input modes (speech & pen, speech & lip movements) • Radical departure from GUIs in basic features, interface design & architectural underpinnings • Rapid development in 1990s of bimodal systems • New fusion & language processing techniques • Diversification of mode combinations & applications • More general & robust hybrid architectures

Advantages of Multimodal Interfaces • Flexibility & expressive power • Support for users’ preferred interaction style • Accommodate more users,** tasks, environments** • Improved error handling & robustness** • Support for new forms of computing, including mobile & pervasive interfaces • Permit multifunctional & tailored mobile interfaces, adapted to user, task & environment

The Challenge of Robustness:Unimodal Speech Technology’s Achilles’ Heel • Recognition errors currently limit commercialization of speech technology, especially for: • Spontaneous interactive speech • Diverse speakers & speaking styles (e.g., accented) • Speech in natural field environments (e.g., mobile) • 20-50% drop in accuracy typical for real-world usage conditions

Improved Error Handling in Flexible Multimodal Interfaces • Users can avoid errors through mode selection • Users’ multimodal language is simplified, which reduces complexity of NLP & avoids errors • Users mode switch after system errors, which undercuts error spirals & facilitates recovery • Multimodal architectures potentially can support “mutual disambiguation” of input signals

Example of Mutual Disambiguation: QuickSet Interface during Multimodal “PAN” Command

Processing & Architecture • Speech & gestures processed in parallel • Statistically ranked unification of semantic interpretations • Multi-agent architecture coordinates signal recognition, language processing, & multimodal integration

General Research Questions • To what extent can a multimodal system support mutual disambiguation of input signals? • How much is robustness improved in a multimodal system, compared with a unimodal one? • In what usage contexts and for what user groups is robustness most enhanced by a multimodal system? • What are the asymmetries between modes in disambiguation likelihoods?

Study 1- Research Method • Quickset testing with map-based tasks (community fire & flood management) • 16 users— 8 nativespeakers & 8 accented(varied Asian, European & African accents) • Research design— completely-crossed factorial with between-subjects factors: (1) Speaker status (accented, native) (2) Gender • Corpus of 2,000 multimodal commands processed by QuickSet

Videotape Multimodal system processing for accented and mobile users

Study 1- Results • 1 in 8 multimodal commands succeeded due to mutual disambiguation (MD) of input signals • MD levels significantly higher for accented speakers than native ones— 15% vs 8.5% of utterances • Ratio of speech to total signal pull-ups differed for users— .65 accented vs .35 native • Results replicated across signal & parse-level MD

Table 1—Mutual Disambiguation Rates for Native versus Accented Speakers

Table 2- Recognition Rate Differentials between Native and Accented Speakers for Speech, Gesture and Multimodal Commands

Study 1- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 41.3% reduction in total speech error rate No gender or practice effects found in MD rates

Study 2- Research Method • QuickSet testing with same 100 map-based tasks • Main study: • 16 users with high-endmic(close-talking, noise-canceling) • Research design completely-crossed factorial: (1) Usage Context- Stationary vs Mobile (within subjects) (2) Gender • Replication: • 6 users with low-endmic (built-in, no noise cancellation) • Compared stationary vs mobile

Study 2- Research Analyses • Corpus of 2,600 multimodal commands • Signal amplitude, background noise & SNR estimated for each command • Mutual disambiguation & multimodal system recognition rates analyzed in relation to dynamic signal data

Mobile user with hand-held system & close-talking headset in moderately noisy environment(40-60 dB noise)

Mobile research infrastructure, with user instrumentation and researcher field station

Study 2- Results • 1 in 7multimodal commands succeeded due to mutual disambiguation of input signals • MD levels significantly higher during mobile than stationary system use— 16% vs 9.5% of utterances • Results replicated across signal and parse-level MD

Table 3- Mutual Disambiguation Rates during Stationary and Mobile System Use

Table 4- Recognition Rate Differentials during Stationary and Mobile System Use for Speech, Gesture and Multimodal Commands

Study 2- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 19-35% reduction in total speech error rate (for noise-canceling & built-in mics, respectively) No gender effects found in MD

Conclusions • Multimodal architectures can support mutual disambiguation & improved robustness over unimodal processing • Error rate reduction can be substantial— 20-40% • Multimodal systems can reduce orclose the recognition rate gapfor challenging users(accented speakers)& usage contexts(mobile) • Error-prone recognition technologies can be stabilized within a multimodal architecture, which functionmore reliably in real-world contexts

Future Directions & Challenges • Intelligently adaptive processing, tailored for mobile usage patterns & diverse users • Improved language & dialogue processing techniques, and hybrid multimodal architectures • Novel mobile & pervasive multimodal concepts • Break the robustness barrier— reduce error rate (For more information— http://www.cse.ogi.edu/CHCC/)

Designing Robust Multimodal Systems for Diverse Users and Mobile Environments

Designing Robust Multimodal Systems for Diverse Users and Mobile Environments

Presentation Transcript

Services for Mobile Users

Multimodal Learning Environments

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

Designing Speech and Multimodal Applications for Seniors

Designing Multimodal Discourses

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

Multimodal Learning Environments Critique and Analysis

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

Multimodal Learning Environments ( mmle )

Designing Robust Services

MODE multimodal methodologies FOR RESEARCHING DIGITAL DATA AND ENVIRONMENTS

Robust Shadow Maps for Large Environments

Robust wireless solutions for industrial environments

Designing for hostile environments

Designing Multimodal Discourses

Designing Predictable and Robust Systems

Adaptive File Transfers for Diverse Environments