1 / 32

Multimodal Human-Computer Interaction

Multimodal Human-Computer Interaction. The design and implementation of a prototype. by Maurits André. Overview. Introduction Problem statement Technologies used Speech Hand gesture input Gazetracking Design of the system Multimodal issues. Overview. Testing program tests

Download Presentation

Multimodal Human-Computer Interaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimodal Human-Computer Interaction The design and implementation of a prototype by Maurits André

  2. Overview • Introduction • Problem statement • Technologies used • Speech • Hand gesture input • Gazetracking • Design of the system • Multimodal issues

  3. Overview • Testing • program tests • usability tests • human factors studies • Conclusions and recommendations • Future work • Video

  4. CAIP Center for Computer Aids in Industrial Productivity Prof. James L. Flanagan Robust Image Understanding Visiometrics and modeling VSLI design Machine Vision Multimedia Information Systems Virtual Reality Groupware Networking Microphone arrays Speech / Speaker Recognition Speech Generation Adaptive voice mimic Image & video compression

  5. Multimodal HCI • Currently: mouse, keyboard input • More natural communication technologies available: • sight • sound • touch • Robust and intelligent combination of these technologies

  6. Aim

  7. Problem statement • Study three technologies: • Speech recognition and synthesis • Hand gesture input • Gazetracking • Design prototype (appropriate application) • Implement prototype • Test and debug • Human performance studies

  8. SR and TTS • Microsoft Whisper system (C++) • Speaker independent • Continuous speech • Restricted task-specific vocabulary (150) • Finite state grammar • Sound capture: microphone array

  9. Hand gesture input • Advantages: • Natural • Powerful • Direct • Disadvantages: • Fatigue (RSI) • Learning • Non-intentional gestures • Lack of comfort

  10. Force feedback tactile glove • Polhemus tracker for wrist position/orientation • 5 gestures are recognized

  11. Implemented gestures • Grab “Move this” • Open hand “Put down” • Point at an object “Select” “Identify” • Thumb up “Resize this”

  12. Eyes output • Direction of gaze • Blinks • Closed eyes • Part of emotion

  13. Gazetracker • ISCAN RK-726 gazetracker • 60 Hz. • Calibration

  14. Application • Requirements: • Multi-user, collaborative • Written in Java • Simple • Choice: • Drawing program • Military mission planning system

  15. Drawing program

  16. Military mission planning

  17. Create Figure Type circle/rect/line Color [white]/red/grn/... Height 0..max_y Width 0..max_x Location (x,y) Frames • Slots • Inheritance • Generic properties • Default values

  18. Command Frame Cmd. with object EXIT Object ID 0...max_int YES Confirm Create Figure Confirm NO Type circle/rect/line Color [white]/red/grn/... Height 0..max_y Width 0..max_x Location (x,y) Move Destination (x,y)

  19. Collaborative Workspace Gazetracking Speech Recognition Gesture Recognition Design Fusion Agent Speech Synthesis

  20. Frame: MOVE object Tank 7 where (x4,y4) Fusion Agent Example:“Move tank seven here.” (x1,y1) (x2,y2) (x3,y3) (x4,y4) Slot Buffer Control Parser Rule Based Feedback

  21. Classification of feedback • Confirmation • Exit the system. Are you sure? • Information retrieval • What is this? This is tank seven. • Where is tank nine. Visual feedback. • Missing data • Create tank. Need to specify an ID for a tank. • Semantic error • Create tank seven. Tank seven already exists. • Resize tank nine. Cannot resize a tank.

  22. Multimodal issues • Referring to objects • describing in speech: Move the big red circle • using anaphora: Move it • by glove • gaze + pronoun: Delete this • glove + pronoun: Delete this • Timestamps Create a red rectangle from here to here T1 T2 T3 T4 T5 T6 T7 T8 xy1 xy2 xy3 xy4 xy5 xy6 xy7 xy8

  23. Multimodal issues • Ambiguity • saying x, looking at y »» x • saying x, pointing at y »» x • looking at x, pointing at y »» x • saying x, gesturing y »» xy or yx • Redundancy • saying x, looking at x »» x • etc.

  24. Program testing • Implementation in Java • Program testing and debugging • Module testing • Integration testing • Configuration testing • Time testing • Recovery testing

  25. Testing • Usability tests • Demonstration with military personnel • Human factors study: • Script for user • Questionnaire for user • Tables for observer • Log-file for observer

  26. Lab

  27. Conclusions: Selecting Modality Accuracy Speed Learning Speech ++ ++ – Gaze + ++ ++ Glove + + + Mouse ++ – +

  28. Conclusions / recommendations • Speech • real-time • low error rate • timestamps • misunderstanding • grammar in help file • Glove • real-time • fatigue • low precision • non-intentional gesture • 2D »» 3D • limited number of gestures • Gaze • real-time • head movements • self-calibration • jumpiness of eye movements • face tracker • object of interest

  29. General remarks • Response time within 1 sec. • Instruction, help files • Application effective but limited

  30. Future work • Human performance studies • Conversational interaction • Context-based reasoning and information retrieval • New design:

  31. Controller User model Gazetracker Facetracker Graphical fb. Speech synth. Speech recog. Discourse mng. Haptic feedback Disambiguator Speaker ident. Control Gesture recog. History Status Domain knowledge Application User Agents Blackboard

  32. Problem statement • Study three technologies: • Speech recognition and synthesis • Hand gesture input • Gazetracking • Design prototype (appropriate application) • Implement prototype • Test and debug • Human performance studies

More Related