240 likes | 337 Views
Designing Robust Multimodal Systems for Diverse Users and Mobile Environments. Sharon Oviatt oviatt@cse.ogi.edu; http://www.cse.ogi.edu/CHCC/. Introduction to Perceptive Multimodal Interfaces.
E N D
Designing Robust Multimodal Systems for Diverse Users and Mobile Environments Sharon Oviattoviatt@cse.ogi.edu; http://www.cse.ogi.edu/CHCC/
Introduction to Perceptive Multimodal Interfaces • Multimodal interfaces recognize combined natural human input modes (speech & pen, speech & lip movements) • Radical departure from GUIs in basic features, interface design & architectural underpinnings • Rapid development in 1990s of bimodal systems • New fusion & language processing techniques • Diversification of mode combinations & applications • More general & robust hybrid architectures
Advantages of Multimodal Interfaces • Flexibility & expressive power • Support for users’ preferred interaction style • Accommodate more users,** tasks, environments** • Improved error handling & robustness** • Support for new forms of computing, including mobile & pervasive interfaces • Permit multifunctional & tailored mobile interfaces, adapted to user, task & environment
The Challenge of Robustness:Unimodal Speech Technology’s Achilles’ Heel • Recognition errors currently limit commercialization of speech technology, especially for: • Spontaneous interactive speech • Diverse speakers & speaking styles (e.g., accented) • Speech in natural field environments (e.g., mobile) • 20-50% drop in accuracy typical for real-world usage conditions
Improved Error Handling in Flexible Multimodal Interfaces • Users can avoid errors through mode selection • Users’ multimodal language is simplified, which reduces complexity of NLP & avoids errors • Users mode switch after system errors, which undercuts error spirals & facilitates recovery • Multimodal architectures potentially can support “mutual disambiguation” of input signals
Example of Mutual Disambiguation: QuickSet Interface during Multimodal “PAN” Command
Processing & Architecture • Speech & gestures processed in parallel • Statistically ranked unification of semantic interpretations • Multi-agent architecture coordinates signal recognition, language processing, & multimodal integration
General Research Questions • To what extent can a multimodal system support mutual disambiguation of input signals? • How much is robustness improved in a multimodal system, compared with a unimodal one? • In what usage contexts and for what user groups is robustness most enhanced by a multimodal system? • What are the asymmetries between modes in disambiguation likelihoods?
Study 1- Research Method • Quickset testing with map-based tasks (community fire & flood management) • 16 users— 8 nativespeakers & 8 accented(varied Asian, European & African accents) • Research design— completely-crossed factorial with between-subjects factors: (1) Speaker status (accented, native) (2) Gender • Corpus of 2,000 multimodal commands processed by QuickSet
Videotape Multimodal system processing for accented and mobile users
Study 1- Results • 1 in 8 multimodal commands succeeded due to mutual disambiguation (MD) of input signals • MD levels significantly higher for accented speakers than native ones— 15% vs 8.5% of utterances • Ratio of speech to total signal pull-ups differed for users— .65 accented vs .35 native • Results replicated across signal & parse-level MD
Table 1—Mutual Disambiguation Rates for Native versus Accented Speakers
Table 2- Recognition Rate Differentials between Native and Accented Speakers for Speech, Gesture and Multimodal Commands
Study 1- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 41.3% reduction in total speech error rate No gender or practice effects found in MD rates
Study 2- Research Method • QuickSet testing with same 100 map-based tasks • Main study: • 16 users with high-endmic(close-talking, noise-canceling) • Research design completely-crossed factorial: (1) Usage Context- Stationary vs Mobile (within subjects) (2) Gender • Replication: • 6 users with low-endmic (built-in, no noise cancellation) • Compared stationary vs mobile
Study 2- Research Analyses • Corpus of 2,600 multimodal commands • Signal amplitude, background noise & SNR estimated for each command • Mutual disambiguation & multimodal system recognition rates analyzed in relation to dynamic signal data
Mobile user with hand-held system & close-talking headset in moderately noisy environment(40-60 dB noise)
Mobile research infrastructure, with user instrumentation and researcher field station
Study 2- Results • 1 in 7multimodal commands succeeded due to mutual disambiguation of input signals • MD levels significantly higher during mobile than stationary system use— 16% vs 9.5% of utterances • Results replicated across signal and parse-level MD
Table 3- Mutual Disambiguation Rates during Stationary and Mobile System Use
Table 4- Recognition Rate Differentials during Stationary and Mobile System Use for Speech, Gesture and Multimodal Commands
Study 2- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 19-35% reduction in total speech error rate (for noise-canceling & built-in mics, respectively) No gender effects found in MD
Conclusions • Multimodal architectures can support mutual disambiguation & improved robustness over unimodal processing • Error rate reduction can be substantial— 20-40% • Multimodal systems can reduce orclose the recognition rate gapfor challenging users(accented speakers)& usage contexts(mobile) • Error-prone recognition technologies can be stabilized within a multimodal architecture, which functionmore reliably in real-world contexts
Future Directions & Challenges • Intelligently adaptive processing, tailored for mobile usage patterns & diverse users • Improved language & dialogue processing techniques, and hybrid multimodal architectures • Novel mobile & pervasive multimodal concepts • Break the robustness barrier— reduce error rate (For more information— http://www.cse.ogi.edu/CHCC/)