140 likes | 162 Views
This article discusses the definitions, examples, and challenges of multimodal interfaces in human-computer interaction (HCI). It explores the concept of multimodality, its application in different interfaces, and the benefits it offers. The article also addresses the question of what constitutes a modality and the various input and output modalities used in HCI. It concludes by discussing the importance of multimodal interfaces in enhancing user experience and enabling more natural interaction with computers.
E N D
Speech & Multimodal Scott Klemmer · 16 November 2006
Some hci definitions • Multimodal generally refers to an interface that can accept input from two or more combined modes • Multimedia generally refers to an interface that produces output in two or more modes • The vast majority of multimodal systems have been speech + pointing (pen or mouse) input, with graphical (and sometimes voice) output
Canonical App: Maps • Why are maps so well-suited? • A visual artifact for computation (Hutchins)
What is an interface • Is it an interface if there’s no method for a user to tell if they’ve done something? • What might an example be? • Is it an interface if there’s no method for explicit user input? • example: health monitoring apps
Sensor Fusion • multimodal = multiple human channels • sensor fusion = multiple sensor channels • Example app: Tracking people (1 human channel) • might use: RFID + vision + keyboard activity + … • I disagree with the Oviatt paper • Speech + lips is sensor fusion, not multimodality
What constitutes a modality? • To some extent, it’s a matter of semantics • Is pen a different modality than a mouse? • Are two mice different modalities if one is controlling a gui, and the other controls a tablet-like ui? • Is a captured modality the same as an input modality? • How does the audio notebook fit into this?
Input modalities • mouse • pen: recognized or unrecognized • speech • non-speech audio • tangible object manipulation • gaze, posture, body-tracking • Each of these experiences has different implementing technologies • e.g., gaze tracking could be laser-based or vision-based
Output modalities • Visual displays • Raster graphics, Oscilloscope, paper printer, … • Haptics: Force Feedback • Audio • Smell • Taste
Why multimodal? • Hands busy / eyes busy • Mutual disambiguation • Faster input • “More natural”
On Anthropomorphism • The multimodal community grew out of the AI and speech communities • Should human communication with computers be as similar as possible to human-human communication?
Multimodal Software Architectures • OAA, AAA, OOPS
Next Time… Vision-Based Interaction Computer Vision for Interactive Computer Graphics, William T. Freeman, Yasunari Miyake, Ken-ichi Tanaka, David B. Anderson, Paul A. Beardsley, Chris N. Dodge, Michal Roth, Craig D. Weissman, William S. Yerazunis, Hiroshi Kage, Kazuo Kyuma A Design Tool for Camera-based Interaction, Jerry Alan Fails and Dan R. Olsen
CS547 Tomorrow Ben Shneiderman, University of Maryland – Science 2.0: The Design Science of Collaboration