1 / 50

USI module U1-5 Multimodal interaction

USI module U1-5 Multimodal interaction. Jacques Terken USI module U1, lecture 5. Contents. Demos and video clips Multimodal behaviour Multimodal interaction, architecture and multimodal fusion Design heuristics, guidelines and tools. http://www.nuance.com/xmode/demo/#

jihan
Download Presentation

USI module U1-5 Multimodal interaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. USI module U1-5Multimodal interaction Jacques Terken USI module U1, lecture 5

  2. Contents • Demos and video clips • Multimodal behaviour • Multimodal interaction, architecture and multimodal fusion • Design heuristics, guidelines and tools SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  3. http://www.nuance.com/xmode/demo/# • http://www.csee.ogi.edu/CHCC/ (Video Quickset) • RASA (combination of tangible and multimodal interaction) • May be also of interest • http://www.gvu.gatech.edu/gvu/events/demo-days/2001/demos010930.html • http://ligwww.epfl.ch/~thalmann/research.html SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  4. Quickset ipaq (ogi – chcc) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  5. Multimodal behaviour • Development of multimodal systems dependent on knowledge about natural integration patterns that are characteristic for the combined use of different modalities • Dealing with myths about multimodal interaction: • Oviatt, S.L., “Ten myths of Multimodal interaction”, Communications of the ACM 42(11), 1999, pp.74-81 SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  6. Myth 1: If you build a multimodal system, users will interact multimodally. Dependent on domain: • Spatial domain: 95-100% of the users have a preference for multimodal interaction; • Other domains: 20% of the commands are multimodal Dependent on type of action: • High MM: adding, moving, modifying objects, calculating distance between objects • Low MM: printing, scrolling etc. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  7. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  8. Distinction between general, selective and spatial actions • General: non-object-directed actions (printing etc.) • Selective: choosing objects • Spatial: manipulation of objects ( adding etc.) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  9. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  10. myth 2: Speech & pointing is the dominant multimodal integration pattern. • Central in Bolt’s speak-and-point interface (“put that there” • Speak-and-point includes only 14% of spontaneous multimodal actions • In human communication pointing accounts for appr. 20% of all gestures • Other actions: handwriting, hand gestures, facial expressions (“Rich” interaction) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  11. myth 3: Multimodal input involves simultaneous signals. • Information from different modalities is often sequential • Often gestures precede speech SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  12. myth 4: Speech is the primary input mode in any multimodal system that includes it, and gestures, head and body movement, gaze direction and other input are secondary • Often speech cannot contain all information (cf. combination of pen + speech) • Gestures are better for some kinds of information • Often gestures indicate the context for speech SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  13. myth 5: Multimodal language does not differ linguistically from unimodal language. • Users often avoid complicated commands in multimodal interaction • Multimodal language is often shorter, syntactically more simple, and more fluent • Unimodal: “place a boat dock on the east, no, west end of reward lake” • Multimodal: [draws rectangle] “add rectangle” • Multimodal language more easy to process • Less anaphora and indirectness SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  14. myth 6: Multimodal integration involves redundancy of content between modes. • Different modalities contribute complementary information: • Speech: subject, object, verb (objects, actions/operations): • Gesture: Location (spatial info) • Even in the case of correction only 1% redundancy SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  15. myth 7: Individual error-prone recognition technologies combine multimodally to produce even greater unreliability. • Combination of inputs enables mutual disambiguation • Users choose the least error-prone modality (“leveraging from users’ natural intelligence about when and how to deploy input modes effectively”) • Combination of error-prone modalities gives in fact a more stable system SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  16. myth 8: All users’ multimodal commands are integrated in a uniform way. • Differences between people • Consistent use within people • Advance detection of integration pattern can result in better recognition SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  17. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  18. myth 9: Different input modes are capable of transmitting comparable content (alt-mode hypothesis). • Differences between modalities: • Type of information • Functionality during communication • Accuracy of expression • Manner of integration with other modalities SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  19. myth 10: Enhanced speed and efficiency are the main advantages of multimodal systems. Applies indeed (to a limited extent) for the spatial domain: • In multimodal pen/speech interaction speed increase with app. 10% More important advantages in other domains: • Decrease in errors an non-fluent speech with 35-50% • Possibility of choice of input: • Less chance of fatigue per modality • Better opportunities for repair • Larger range of users SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces Multi-Modal Interaction (0H640)

  20. Advantages: Robustness • Individual signal processing technologies error-prone • Integration of complementary modalities to yield synergy, capitalizing on the strength of each modality and overcoming weaknesses in the other • Users will select the input mode that they consider less error prone for particular lexical content • User’s language is simplified when interacting multimodally • Users tend to switch modes after system errors, facilitating error recovery • Users report less frustration when interacting multimodally (greater sense of control) • Mutual compensation/disambiguation SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  21. Technologies: Types of multimodality W3C (see http://www.w3.org/TR/mmi-reqs/ ): Seen from the perspective of the system (how the input is handled) • Sequential multimodal input Modality A for action a, next Modality B for action b, each event handled as a separate event • Simultaneous (Uncoordinated) multimodal input Each event handled as a separate event. Choice between different modalities at each moment in time • Composite (coordinated simultaneous) multimodal input Events integrated into a single event before interpretation. (“true” multimodality) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  22. Coutaz & Nigay SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  23. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  24. Mutual disambiguation (MD) • Speech input: n-best list • Ditch • Ditches • Gestural input • Joint interpretation: • Ditches • Benefit may be dependent on situation (e.g. larger for non-native speakers) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  25. Early fusion • Closely coupled and synchronized modalities such as speech and lip movements • “Feature level” fusion • Based on multiple Hidden Markov Models or temporal neural networks. Correlation structure between modes can be taken into account automatically via learning • Problems: modelling complexity, computational intensity, training difficulty SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  26. Late fusion • “Semantic level” fusion • Individual recognizers • Sequential integration • Advantages: scalable – individual recognizers don’t need to be retrained • Early approaches: multimodal command’s posterior probability is the cross-product of the posterior probabilities of the associated constituents  No advantage taken from mutual compensation phenomenon SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  27. Architectural requirements for late semantic fusion • Fine-grained timestamping • Sequentially-integrated or simultaneously delivered • Common representational format for different modalities • Frame based (multimodal fusion through unification of feature-structures)  Mutual disambiguation SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  28. Unification utterance gesture SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  29. Design of multimodal interfaces • Task analysis What are the actions that need to be performed? • Task allocation What party is the most suitable candidate for performing particular actions? • Modality allocation What modality or combination of modalities is most suited to perform particular actions? Current presentation focuses on 3 SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  30. Definition of ‘modality’ • Modality as sensory channel However, stating that particular numeric information should be presented in the visual modality provides little grip • Hence, the notion of ‘representational modality’ has been proposed (Bernsen), which distinguishes e.g. table and graph as two different modalities • For the time being, we use ‘modality’ in the more restricted sense of sensory channel, and look for mappings between actions and modalities SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  31. Relevant dimensions • Nature of the information • Interaction paradigm • Physical and dialogue context • Platform • Accessibility • Multitasking SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  32. Rules of thumb, heuristics • Michaelis and Wiggins (1982) • Cohen and Oviatt (1994) • Suhm (2000) • Larsson (2003) • Reeves, Lai et al. (2004) • For references see Terken J. “Guidelines and Tools for the Design of Multimodal Interfaces”, Workshop ASIDE2005, Aalborg (DK) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  33. Michaelis and Wiggins (1982) • Speech generation is preferable when the • message is short. • message will not be referred to later. • messages deal with events in time. • message requires an immediate response. • visual channels of communication are overloaded. • environment is too brightly lit, too poorly lit, subject to severe vibration, or otherwise unsuitable for transmission of visual information. • user must be free to move around. • user is subjected to high G forces or anoxia. • Tentative guidelines for when NOT to use speech may be derived from these suggestions through negation. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  34. Cohen and Oviatt (1994) • spoken communication with machines (both input and output) may be advantageous: • when the user’s hands or eyes are busy • when only limited keyboard and/or screen is available • when the user is disabled • when pronunciation is the subject matter of computer use • when natural language interaction is preferred SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  35. Suhm (2000) Principles for choosing the set of modalities 2.Consider speech input for entry of textual data, dialogue-oriented tasks, and command control. Speech input is generally less efficient for navigation, manipulation of image data. and resolution of object references. 3.Consider written input for corrections, entry of digits, and entry of graphical data (formulas, sketches, etc.) 4.Consider gesture input for indicating scope or type of commands, for resolving deictic object references 5. Consider the traditional modalities (keyboard and mouse input) as alternative, unless superiority of novel modalities (speech, pen input) is proven. • Principles to circumvent limitations of recognition technology • Principles for the implementation of Pen-Speech Interfaces SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  36. Larsson (2003) • Satisfy Real-world Constraints • Task-oriented Guidelines • Physical Guidelines • Environmental Guidelines • Communicate Clearly, Concisely, and Consistently with Users • Consistency Guidelines • Organizational Guidelines • Help Users Recover Quickly and Efficiently from Errors • Conversational Guidelines • Reliability Guidelines • Make Users Comfortable • System Status • Human-memory Constraints • Social Guidelines • … SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  37. Reeves, Lai et al. (2004) Propose a set of multimodal design principles that are founded in perception and cognition science (but motivation remains implicit) Four general areas • Designing multimodal input and output • Adaptivity • Consistency • Feedback • Error prevention/handling SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  38. Designing Multimodal Input and Output • Maximize human cognitive and physical abilities. Designers need to determine how to support intuitive, streamlined interactions based on users' human information processing abilities (including attention, working memory, and decision making) for example: • Avoid unnecessarily presenting information in two different modalities in cases where the user must simultaneously attend to both sources to comprehend the material being presented; such redundancy can increase cognitive load at the cost of learning the material. • Maximize the advantages of each modality to reduce user's memory load in certain tasks and situations; • System visual presentation coupled with user manual input for spatial information and parallel processing; • System auditory presentation coupled with user speech input for state information, serial processing, attention alerting, or issuing commands. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  39. Integrate modalities in a manner compatible with user preferences, context, and system functionality. Additional modalities should be added to the system only if they improve satisfaction, efficiency, or other aspects of performance for a given user and context. When using multiple modalities: • Match output to acceptable user input style (for example, if the user is constrained by a set grammar, do not design a virtual agent to use unconstrained natural language); • Use multimodal cues to improve collaborative speech (for example, a virtual agent's gaze direction or gesture can guide user turn-taking); • Ensure system output modalities are well synchronized temporally (for example, map-based display and spoken directions, or virtual display and non-speech audio); • Ensure that the current system interaction state is shared across modalities and that appropriate information is displayed in order to support: • Users in choosing alternative interaction modalities; • Multidevice and distributed interaction; SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  40. 3. Theoretical approaches • Modality theory (Bernsen c.s.) ‘Modality’ defined as ‘representational modality’ SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  41. Modality theory (Bernsen) Aim • Given any particular class of task domain information which needs to be exchanged between user and system during task performance, identify the set of input/output modalities which constitute an optimal solution to the representation and exchange of that information (Bernsen, 2001). • Taxonomic analyses: • (representational) Input and output modalities are characterized in terms of a limited number basic features such as • linguistic/nonlinguistic, • analogue/non-analogue, • arbitrary/nonarbitrary, • static/dynamic. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  42. Modality properties can then be applied according to the following procedure: 1.Requirements Specification > 2.Modality Properties + Natural Intelligence > 3. Advice/Insight with respect to modality choice. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  43. [MP1] Linguistic input/output modalities have interpretational scope, which makes them eminently suited for conveying abstract information. They are therefore unsuited for conveying high-specificity information including detailed information on spatial manipulation and location. • [MP2] Linguistic input/output modalities, being unsuited for specifying detailed information on spatial manipulation, lack an adequate vocabulary for describing the manipulations. • [MP3] Arbitrary input/output modalities impose a learning overhead which increases with the number of arbitrary items to be learned. • [MP4] Acoustic input/output modalities are omnidirectional. • [MP5] Acoustic input/output modalities do not require limb (including haptic) or visual activity. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  44. 4. Tools • SMALTO (Bernsen) • Multimodal property flowchart (Williams et al., 2002) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  45. SMALTO • Addresses the “Speech functionality problem”: • SMALTO has been created by taking a large number of claims or findings from the literature on designing speech or speech-centric interfaces and casting these claims into the structured representation expressing the Speech Functionality Problem SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  46. [Combined speech input/output, speech output, or speech input modalities M1, M2 and/or M3 etc.] or [speech modality M1, M2 and/or M3 etc. in combination with non-speech modalities NSM1, NSM2 and/or NSM3 etc.] • are [useful or not useful] • for [generic task: GT] • and/or ]speech act type: SA] • and/or [user group: UG] • and/or [interaction mode: IM] • and/or [work environment: WE] • and/or [generic system: GS] • and/or [performance parameter: PP] • and/or [learning parameter: LP] • and/or [cognitive property: CP] • and/or [preferable or non-preferable] to [alternative modalities AM1, AM2 and/or AM3 etc.] • and/or [useful on conditions] C1, C2 and/or C3 etc. SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  47. SMALTO has been evaluated within the framework of projects involving the creators and in the DISC project • Informal evidence indicates that it is difficult to apply for “linguistically naïve” designers because of the way the modality properties are formulated • This was also the motivation for the Modality Property Flowchart (Williams et al. 2002) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  48. Multimodal property flowchart pdf SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  49. Multimodal interfaces as a particular type of interfaces  Multimodal property flowchart needs to be combined with general usability heuristics for interface design (e.g. Nielsen) SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

  50. Main points • Multimodal interfaces match the natural expressivity of human beings • Taxonomy of multimodal interaction • Limitations of signal processing in one modality can be overcome by taking into consideration input from another modality (multimodal disambiguation) • Mapping of functionalities onto modalities not always straightforward  support from guidelines and tools SAI User-System Interaction U1, Speech in the interface: 5. Multimodal Interfaces

More Related