1 / 24

Designing Robust Multimodal Systems for Diverse Users and Mobile Environments

Designing Robust Multimodal Systems for Diverse Users and Mobile Environments. Sharon Oviatt oviatt@cse.ogi.edu; http://www.cse.ogi.edu/CHCC/. Introduction to Perceptive Multimodal Interfaces.

yank
Download Presentation

Designing Robust Multimodal Systems for Diverse Users and Mobile Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing Robust Multimodal Systems for Diverse Users and Mobile Environments Sharon Oviattoviatt@cse.ogi.edu; http://www.cse.ogi.edu/CHCC/

  2. Introduction to Perceptive Multimodal Interfaces • Multimodal interfaces recognize combined natural human input modes (speech & pen, speech & lip movements) • Radical departure from GUIs in basic features, interface design & architectural underpinnings • Rapid development in 1990s of bimodal systems • New fusion & language processing techniques • Diversification of mode combinations & applications • More general & robust hybrid architectures

  3. Advantages of Multimodal Interfaces • Flexibility & expressive power • Support for users’ preferred interaction style • Accommodate more users,** tasks, environments** • Improved error handling & robustness** • Support for new forms of computing, including mobile & pervasive interfaces • Permit multifunctional & tailored mobile interfaces, adapted to user, task & environment

  4. The Challenge of Robustness:Unimodal Speech Technology’s Achilles’ Heel • Recognition errors currently limit commercialization of speech technology, especially for: • Spontaneous interactive speech • Diverse speakers & speaking styles (e.g., accented) • Speech in natural field environments (e.g., mobile) • 20-50% drop in accuracy typical for real-world usage conditions

  5. Improved Error Handling in Flexible Multimodal Interfaces • Users can avoid errors through mode selection • Users’ multimodal language is simplified, which reduces complexity of NLP & avoids errors • Users mode switch after system errors, which undercuts error spirals & facilitates recovery • Multimodal architectures potentially can support “mutual disambiguation” of input signals

  6. Example of Mutual Disambiguation: QuickSet Interface during Multimodal “PAN” Command

  7. Processing & Architecture • Speech & gestures processed in parallel • Statistically ranked unification of semantic interpretations • Multi-agent architecture coordinates signal recognition, language processing, & multimodal integration

  8. General Research Questions • To what extent can a multimodal system support mutual disambiguation of input signals? • How much is robustness improved in a multimodal system, compared with a unimodal one? • In what usage contexts and for what user groups is robustness most enhanced by a multimodal system? • What are the asymmetries between modes in disambiguation likelihoods?

  9. Study 1- Research Method • Quickset testing with map-based tasks (community fire & flood management) • 16 users— 8 nativespeakers & 8 accented(varied Asian, European & African accents) • Research design— completely-crossed factorial with between-subjects factors: (1) Speaker status (accented, native) (2) Gender • Corpus of 2,000 multimodal commands processed by QuickSet

  10. Videotape Multimodal system processing for accented and mobile users

  11. Study 1- Results • 1 in 8 multimodal commands succeeded due to mutual disambiguation (MD) of input signals • MD levels significantly higher for accented speakers than native ones— 15% vs 8.5% of utterances • Ratio of speech to total signal pull-ups differed for users— .65 accented vs .35 native • Results replicated across signal & parse-level MD

  12. Table 1—Mutual Disambiguation Rates for Native versus Accented Speakers

  13. Table 2- Recognition Rate Differentials between Native and Accented Speakers for Speech, Gesture and Multimodal Commands

  14. Study 1- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 41.3% reduction in total speech error rate No gender or practice effects found in MD rates

  15. Study 2- Research Method • QuickSet testing with same 100 map-based tasks • Main study: • 16 users with high-endmic(close-talking, noise-canceling) • Research design completely-crossed factorial: (1) Usage Context- Stationary vs Mobile (within subjects) (2) Gender • Replication: • 6 users with low-endmic (built-in, no noise cancellation) • Compared stationary vs mobile

  16. Study 2- Research Analyses • Corpus of 2,600 multimodal commands • Signal amplitude, background noise & SNR estimated for each command • Mutual disambiguation & multimodal system recognition rates analyzed in relation to dynamic signal data

  17. Mobile user with hand-held system & close-talking headset in moderately noisy environment(40-60 dB noise)

  18. Mobile research infrastructure, with user instrumentation and researcher field station

  19. Study 2- Results • 1 in 7multimodal commands succeeded due to mutual disambiguation of input signals • MD levels significantly higher during mobile than stationary system use— 16% vs 9.5% of utterances • Results replicated across signal and parse-level MD

  20. Table 3- Mutual Disambiguation Rates during Stationary and Mobile System Use

  21. Table 4- Recognition Rate Differentials during Stationary and Mobile System Use for Speech, Gesture and Multimodal Commands

  22. Study 2- Results (cont.) Compared to traditional speech processing, spoken language processed within a multimodal architecture yielded: 19-35% reduction in total speech error rate (for noise-canceling & built-in mics, respectively) No gender effects found in MD

  23. Conclusions • Multimodal architectures can support mutual disambiguation & improved robustness over unimodal processing • Error rate reduction can be substantial— 20-40% • Multimodal systems can reduce orclose the recognition rate gapfor challenging users(accented speakers)& usage contexts(mobile) • Error-prone recognition technologies can be stabilized within a multimodal architecture, which functionmore reliably in real-world contexts

  24. Future Directions & Challenges • Intelligently adaptive processing, tailored for mobile usage patterns & diverse users • Improved language & dialogue processing techniques, and hybrid multimodal architectures • Novel mobile & pervasive multimodal concepts • Break the robustness barrier— reduce error rate (For more information— http://www.cse.ogi.edu/CHCC/)

More Related