130 likes | 269 Views
eNTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation. Overview. Context: Exploitation of multi-modal signals for the development of an active robot/agent listener Storytelling experience :
E N D
eNTERFACE’08Multimodal Communication with Robots and Virtual Agentsmid-term presentation
Overview Context: • Exploitation of multi-modal signals for the development of an active robot/agent listener • Storytelling experience : • Speakers told a story of an animated cartoon they had just seen 2- Tell the story to a robot or an agent 1- See the cartoon
Overview Active listening : • During natural interaction, speakers see if the statements have been correctly understood (or at least heard). • Robots/agents should also have active listening skills… • Characterization of multi-modal signals as inputs of the feedback model: • Speech analysis : prosody, keywords recognition, pauses • Partner analysis : face tracking, smile detection • Robot/agent feedbacks (outputs): • Lexical non-verbal behaviors • Feedback model: • Exploitation of both inputs and outputs signals • Evaluation: • Storytelling experiences are usually evaluated by annotation
Audio visual recordings of a storytelling between a speaker and a listener. • 22 storytelling sessions telling the “Tweety and Sylvester - Canary row” cartoon story. • Several conditions (speaker and listener): same language, different. • Languages: Arabic, French, Turkish and Slovak • Annotation oriented to interaction analysis: • Smile, Head nod, shake, Eye brow, Acoustic prominence
Multi-modal feature extraction Multi-modal feedback Feedback strategy Architecture of an interaction feedback model
Multi-modal feature extraction Key idea: Extraction of features annotated from the STEAD corpus: • Face processing: Head nod, shake, smile, activity. • Keyword spotting: keywords have been defined in order to switch the agent’s state. • Speech Processing: Acoustic Prominence detection
Multi-modal feature extraction Keyword spotting: keywords have been defined in order to switch the agent’s state.
Multi-modal feature extraction Acoustic Prominence Detection: • Prosody analysis in real-time by using Pure Data: • Development of different Pure Data objects (written in C): • Voice Activity Detection • Pitch and Energy extraction • Detection: • Statistical model (Gaussian assumption): • Kullback-Leibler similarity
Feedback model • Extraction of rules from the annotations (STEAD corpus): • Rules are defined in the literature • Application to our specific task • When a feedback is triggered? • Feedback behaviours: • ECA : Several behaviours are already defined (head movements, face expressions) for GRETA with BML (Behaviour Markup Language). • ROBOT: We defined several basic behaviours for our AIBO robot (inspired from dog’s reactions): Mapping from BML and robot movements.
Future works • Integration: • Real-time Multi-modal Feature Extraction: • Prominence detection object (Pure Data) • Communication between the modules by PsyClone • Already done for Video processing. • Tests of Feedback Behaviours for AIBO • Agent’s state modifications • Recordings and annotations of storytelling experiences with both GRETA and AIBO.