160 likes | 293 Views
eNTERFACE 08 Project #1 “ MultiParty Communication with a Tour Guide ECA” Final presentation August 2 9th, 2008. Project Overview Objectives, Issues & Work Done System Overview Configuration and Design Conclusion. Outline.
E N D
eNTERFACE 08Project #1 “MultiParty Communication with a Tour Guide ECA” Final presentation August 29th, 2008
Project Overview • Objectives, Issues & Work Done • System Overview • Configuration and Design • Conclusion Outline
Main objective: develop an ECA Tour Guide system which can interract with one or two users • Research features: • multiparty dialogue model and scenario between two humans and ECA • handling and combining input data: users presence and behaviors (speech, tracking) • gaze behaviors control and nonverbal model of ECA Project Objectives
We implemented components which support scenario based on narration and interruptions • ECA is narrator, users can ask context-related questions (“where”, “how”, “when”) • speaker, addresse and listener identification, ECA gaze model • ECA can ask users simple “yes/no” questions to keep attention • System can detect users appearance and dynamically initiate/end session • System can detect and handle situation when users are paying less attention • System can recover from failure (e.g. SR does not recognize user’s speech) Work done: Component Functionality Overview
Components are implemented • System is being integrated • debugging and full testing is needed • Not supported: • Detection of situation when users are starting their conversation • Detection of speech collision between users • Smart scheduling and control of ECAs behaviors Work done...about to be done...
Functionality: • Detects users requests (“Where”, “How”, “When”, “Who”) • Detects users willingness to leave the system • Detects results of simple questioners(“yes/no”) • Detects unknown words • Implementation: • Keywords detection with confidence score and speech duration is implemented by using Loquendo API Speech Recognition
Functionality of components: • Detect motions and users appearance/disappearance • Detect number of users present • Detect users face orientation and increased/decreased attention • left, right user • Implementation: • OpenCV (motion) & Okao Vision (face orientation, gazing) Nonverbal Inputs: Users appearance and face orientation
Makes decisions “when and what to do to whom”: • Handles multimodal input events (number of users, attention, speech channels) • Handles user interruptions while ECA is speaking • Handles failures from SR component • Generates multimodal output and controls ECA’s gazing • Simple rule: “First one will be served” • “yes”/”no” questionnaire is exception • No domain knowledge and behavior scheduling Decision Making Component- Functionalities
Decision Making Component component uses ideas from information state theory [Larsson’00] and AIML: • The progress of dialogue is represented by a set of variables • Most appropriate plans are selected and scheduled by simple inference • Time control to obtain both messages from speech channels in case (“yes/no”) questions • Component is being developed by using MIDIKI’s toolkit as reference Decision Making Component - Implementation
Functionality: • Animation player uses scripted behaviors (GSML language) to generate speech and animation • Model of gaze in a multiparty communication is supported: • Gazing control is obtained on the utterance level • Gaze pattern is following conversational rules (who is addresee, who is listener) • Implementation: • Visage SDK (based on MPEG-4 standard) • 3ds Max Animation Player
Components to support context-based two party human - ECA communication are implemented • System is being integrated, but not fully tested • Component issues: • missing face tracking and domain knowledge about users behaviors • simple dialogue management and control (no smart scheduling and smart gaze control) • Future directions: system debugging and testing, implement tracking, improve gazing control, study on users behaviors and gazing, system evaluation Conclusion