Collection and Analysis of Multimodal Interaction in Direction Giving Dialogues

Collection and Analysis of Multimodal Interaction in Direction Giving Dialogues Towards an Automatic Gesture Selection Mechanism for Metaverse Avatars Seikei University Takeo Tsukamoto YumiMuroya Yukiko Nakano Masashi Okamoto Japan

Overview • Introduction • Research Goal and Questions • Approach • Data Collection Experiment • Analysis • Conclusion and Future Work

Introduction • Online 3D virtual worlds based on Metaverse application is growing steadily in popularity • Ex.：Second Life (SL) ⇒The communication method is limitedto: • Online chat with speech balloons • Manual gesture generation Hello

Introduction(Cont.) • Human face-to-face communication is largely dependent on non-verbal behaviors • Ex. direction giving dialogues • Many spatial gestures are used in order to illustrate directions and physical relationships of buildings and landmarks How can we implement natural non-verbal behaviors into Metaverse application ?

Research Goal and Questions • Goal • Establish natural communication between avatars in Metaverse based on human face-to-face communication • Research Questions • Automation : gesture selection • How to automatically generate proper gestures? • Comprehensibility : gesture display • How to intelligibly display gestures to interlocutor?

Previous work • An automatic gesture selection mechanism for Japanese chat texts in Second Life [Tsukamoto,2010] /2you keep going straight in this road, then you will be able to find a house having a round window on your left./

Proxemics • Previous work doesn’t consider proxemics ⇒ There are some cases when avatar’s gesture becomes unintelligible to the others Proxemics is important to implement comprehensible gestures in Metaverse

Approach Conduct an experiment to collect human gestures in direction giving dialogues Collect participant’s verbal and non-verbal data Analyze the relationship between gestures and proxemics

Data Collection Experiment Experimental Procedure • Direction Giver (DG) • Know the way to any place on campus of Seikei Univ. • Direction Receiver (DR) • Know nothing about the campus of Seikei Univ. The DR asks a way to a specific building DG The DG explains how to get to the building DR

Experimental Instruction Direction Receiver • Instructed to completely understand the way to the goal through a conversation with the DG Direction Giver • Instructed to confirm that the DR understood the direction correctly after the explanation was finished

Experimental Materials Each pair recorded a conversationfor each goal place

Experimental Equipments Headset microphone Head Shoulder Right arm Abdominal Experimental Video Equipments

Collected Data Motion Capture Data Video Data Transcription of Utterances

Analysis • Investigated DG’s gesture distribution with respect toproxemics • Analyzed 30 dialogues collected from 10 pairs Analysis was focused on the movements of DG’s right arm during gesturing

Automatic Gesture Annotation • It is very time consuming to manually annotate • nonverbal behaviors • Automatically annotated the gesture occurrence • More than 77% of the gestures are right arm • gestures • Built a decision tree that identified right arm gestures • Weka J48 was used for the decision tree learning Extracted features • Movement of position(x, y, z) • Rotation(x, y, z) • Relative position of the right arm to shoulder(x, y, z) • Distance between right arm and shoulder • Binary judge • Gesturing / • Not gesturing

Automatic Gesture Annotation(Cont.) • As the result of 10-fold cross validation, the accuracy is 97.5% • Accurate enough for automatic annotation Example of automatic annotation

Gesture Display Space • Defined as the overlapamong the DG’s front area, the DR’s front area, and the DR’s front field of vision DG’s body Direction vector DR’s body Direction vector DＧ DR Gesture Display Space Center DR’s front field of Vision Distance of DR from the center Distance of DG from the center Direction Giver Direction Receiver

Categories of Proxemics • Define 450mm to 950mm as the standard distance from the center of the gesture display space • Human arm length is 60cm to 80cm, by adding 15cm margin

Analysis：Relationship between Proxemics and Gesture Distribution • Analyze the distribution of gestures by plotting the DG’s right arm position Close_to_DG Close_to_DR Close_to_Both Normal Similar Smaller Wider

Analysis：Relationship between Proxemics and Gesture Distribution(Cont.) Close_to_Both < Normal = Close_to_DG < Close_to_Both

Applying the Proxemics Model • Create avatar gestures based on our proxemics model • To test whether the findings are applicable Close_to_DR Close_to_DG

Conclusion • Conducted an experiment to collect human gestures in direction giving dialogues • Investigated the relationship between the proxemics and the gesture distribution • Proposed five types of proxemics characterized by the distance from the gesture display space • Found that the gesture distribution range was different depending on the proxemics of the participants

Future Work • Establish a computational model of determining gesture direction • Examine the effectiveness of the model • whether the users perceive the avatar’s gestures being appropriate and informative

Thank you for your attention

Related work • [Breitfuss, 2008] Built a system that automatically adds gestural behavior and eye gaze • Based on linguistic and contextual information of input text • [Tepper, 2004] Proposed a method for generating novel iconic gestures • Used spatial information about locations and shape of landmarks to represent concept of words • From a set of parameters, iconic gestures are generated without relying on a lexicon of gesture shapes • [Bergmann, 2009] Represented individual variation of gesture shapes using Bayesian network • Built an extensive corpus of multimodal behaviors in direction-giving and landmark description task

Collection and Analysis of Multimodal Interaction in Direction Giving Dialogues