230 likes | 243 Views
Explore the use of natural communication modalities in interface design, providing advantages over GUI and unimodal systems for easier, faster, and more efficient interaction. Discover potential application areas and challenges for designing multimodal interfaces.
E N D
Multimodal InterfacesRobust interaction where graphical user interfaces fear to tread Philip R. Cohen Professor and Co-Director Center for Human-Computer Communication Oregon Health and Science Univ. http://www.cse.ogi.edu/CHCC and Natural Interaction Systems, LLC
Team Effort Co-PI: Sharon Oviatt Xiao Huang Ed Kaiser Sanjeev Kumar Rebecca Lunsford Richard Wesson Rajah Annamalai Alex Arthur Paulo Barthelmess Rachel Coulston Marisa Flecha-Garcia Multidisciplinary research
Multimodal Interaction • Use of one or more natural communication modalities—e.g. , Speech, gesture, sketch … • Advantages over GUI and unimodal systems • Easier to use; Less training • Robust, flexible • Preferred by users • Faster, more efficient • Supports new functionality • Applies to many different environments and form factors that challenge GUI, especially mobile ones
Potential Application Areas • Architecture and Design • Geographical Information Systems • Emergency Operations • Field-based Operations • Mobile Computing and Telecommunications • Virtual/Augmented Reality • Pervasive/Ubiquitous Computing • Computer-Supported Collaborative Work • Education • Entertainment
Challenges for multimodal interface design • More than 2 modes –e.g. spoken, gestural, facial expression, gaze; various sensors • Inputs are uncertain –vs. Keyboard/mouse • Corrupted by noise • Multiple people • Recognition is probabilistic • Meaning is ambiguous Design for uncertainty
Approach Gain robustness via • Fusion of inputs from multiple modalities • Using strengths of one mode to compensate for weaknesses of others—design time and run time • Avoiding/correcting errors • Statistical architecture • Confirmation • Dialogue context • Simplification of language in a multimodal context • Output affecting/channeling input
Demo Started with 50 & 100Mhz 486
Late MM Integration • Parallel recognizers and “understanders” • Time-stamped meaning fragments for each stream • Common framework for meaning representation – typed feature structures • Meaning fusion operation -- unification • Process for determining a joint interpretation (subject to semantic, and spatiotemporal constraints) • Statistical ranking • Flexible asynchronous architecture • Must handle unimodal and multimodal input
From speech (one of many hyp’s) “Evacuation route” Color: green Label: Evacuation route Object: Location: Color: green Label: Evacuationroute Line_obj [ ] Object: Line Create_line Line_obj From sketch Coordlist: [(95302,94360), (95305,94365)], …] Location: Coordlist: ISA Line [(95302,94360), (95305,94365)], …] Location: Create_line Line command [location: point[Xcoord: 95305,Ycoord: 94365 ]] command
MutualDisambiguation gesture object multimodal speech g1 o1 mm1 s1 mm2 g2 o2 s2 • Each input mode provides a set of scored recognition hypotheses s3 g3 o3 mm3 g4 mm4 • MD derives the best joint interpretation by unification of meaning representation fragments • PMM = αPS + βPG + C; learn α, β, and C over a multimodal corpus • MD stabilizes system performance in challenging environments
Benefits of mutual disambiguation Application RER Reference
Efficiency Benefits CPOF MM 16x faster (NIS) Lines & Areas
Demonstration CMU -- speech MIT – body tracking OHSU –multimodal fusion (speech + writing/sketch, 3D gesture) Stanford (NLP, dialogue)
Tangible Multimodal Systems for Safety-Critical Applications What’s Missing? A Division Command Post during an exercise McGee et al., CHI ‘02; Cohen & McGee, CACM’04
Many work practices rely on paper ATC -- Mackay ‘98 ICU -- Gorman et al., 2000
Why do they use paper? • Already know the interface • Poor computer interfaces • Fail-safe; robust to power outages • High resolution • Large/small scale • Cheap • Lightweight • Portable • Collaboration
Clinical Data Entry “Perhaps the single greatest challenge that has consistently confronted every clinical system developer is to engage clinicians in direct data entry” (IOM, 1997, p. 125) “To make it simple for the practitioner to interact with the record, data entry must be almost as easy as writing.” (IOM. 1997, p. 88)
Multimodal Interaction with Paper(NIS) Based on Anoto technology
Benefits • Most people (incl. kids, seniors) know how to use the pen • Portability (works over cell phone) • Ubiquity – paper is everywhere • Collaborative – multiple simult. pens • Next – use for note-taking, alone or in meetings; fuse with ongoing speech • Many new applications – e.g., architecture, engineering, education, field data capture
Elementary Science Education Sharon Oviatt
Quiet Interfaces that Help People Think Sharon Oviatt oviatt@cse.ogi.edu http://www.cse.ogi.edu/CHCC/