200 likes | 344 Views
Combined Gesture-Speech Analysis and Synthesis. M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp {msargin,eerzin,yyemez,mtekalp}@ku.edu.tr Multimedia Vision and Graphics Laboratory, Koc University. Outline. Project Objective Technical Description
E N D
Combined Gesture-Speech Analysis and Synthesis M. Emre Sargın, Engin Erzin, Yücel Yemez, A. Murat Tekalp {msargin,eerzin,yyemez,mtekalp}@ku.edu.tr Multimedia Vision and Graphics Laboratory, Koc University The SIMILAR NoE Summer Workshop 2005
Outline • Project Objective • Technical Description • Preparation of Gesture-Speech Database • Detection of Gesture Elements • Gesture-Speech Correlation Analysis • Synthesis of Gestures Accompanying Speech • Resources • Work Plan • Team Members The SIMILAR NoE Summer Workshop 2005
Project Objective • The production of speech and gesture is interactive throughout the entire communication process. • Computer-Human Interaction systems should be interactive such that, for an edutainment application, animated person’s speech should be aided and complemented by it’s gestures. • Two main goals of this project: • Analysis and modeling of correlation between speech and gestures. • Synthesis of correlated natural gestures accompanying speech. The SIMILAR NoE Summer Workshop 2005
Technical Description • Preparation of Gesture-Speech Database • Detection of Gesture Elements • Gesture-Speech Correlation Analysis • Synthesis of Gestures Accompanying Speech The SIMILAR NoE Summer Workshop 2005
Preparation of Database • Gestures of a specific person will be investigated. • The video database related with that specific person should include the gestures that he/she frequently uses. • Locations of head, arm, elbows, etc. should easily be detectable and traceable. The SIMILAR NoE Summer Workshop 2005
Detection of Gesture Elements • In this project, we consider arm and head gestures. • Main tasks included in detection of gesture elements: • Tracking of head region. • Tracking of hand and possibly shoulder and elbow. • Extraction of gesture features. • Recognition and labeling of gestures. The SIMILAR NoE Summer Workshop 2005
Head Region Tracking • To extract motion information coming from head one should first extract head region. • Exhaustive search of head in each frame is a possible solution. However this is computationally inefficient. • Tracking is efficient by the means of computational complexity. • Motion information calculated for tracking will be used for head gesture features. The SIMILAR NoE Summer Workshop 2005
Tracking Methodology • Exhaustive search for head region in initial frame • Haar-Based Face Detection • Skin Color information • Extraction of motion information from head region • Optical flow vectors • Fitting global motion parameters optical flow vectors • Warp search window according to motion information. • Search for head region in the search window. The SIMILAR NoE Summer Workshop 2005
Head Tracking Results The SIMILAR NoE Summer Workshop 2005
Hand Tracking Methodology • Hand region will be extracted using skin color information. • Robust State-Space Tracking will be applied. • Observations are position of hand. • States are position, speed and acceleration of hand. • Kalman Filtering removes unwanted noise from features • In Regular Kalman Filter, parameters are fixed. • In Robust Kalman Filter parameters are re-adjusted for each iteration to minimize MSE and overcome the effects of abrupt changes in motion of hand. The SIMILAR NoE Summer Workshop 2005
Extraction of Gesture Features • Head Gesture Features: Global Motion Parameters calculated within head region will be used. • Hand Gesture Features: Hand center of mass position and calculated velocity will form hand gesture features. The SIMILAR NoE Summer Workshop 2005
Gesture-Speech Correlation Analysis • Recognized gestures are labeled w.r.t. time. • Head Gestures: Down, Up, Left, Right, Left-Right, … • Arm Gestures: Abduction, Adduction, Extension,… • Recognized speech patterns are labeled w.r.t. time. • Semantic Info: Approval, Refusal phrases, etc. • Prosodic Info: Intonational phrases, ToBI transcriptions, etc. • Correlation Analysis via examining • Co-occurrence Matrix • Input/Output Hidden Markov Models The SIMILAR NoE Summer Workshop 2005
Co-occurrence Matrix • Estimation of joint probability distribution function, f(g,s) • For each time sample give a vote to related gesture-speech label pair. • For a specific speech element the most correlated gesture feature will be: • gi=argmax ( f (gx,si) ) • Relatively easy to compute. • Gives an intuition about what we are examining. x The SIMILAR NoE Summer Workshop 2005
Input/Output Hidden Markov Models • IOHMM is a graphical model which allows the mapping of input sequences into output sequences. • It is used in three tasks of sequence processing: • Prediction • Regression • Classification • The model is trained to maximize the conditional distribution of an output sequence {y1,…,yt} given an input sequence {x1,…,xt}. • In our project: • Input sequence will be speech labels. • Output sequence will be gesture labels. The SIMILAR NoE Summer Workshop 2005
Synthesis of Gestures Accompanying Speech • Based on the methodology used in correlation analysis given a speech signal: • Features will be extracted. • Most probable speech label will be designated to speech patterns. • Gesture pattern that is most correlated with speech pattern will be used to animate a stick model of a person. The SIMILAR NoE Summer Workshop 2005
Resources • Database Preparation and Labeling • VirtualDub • Anvil • Paraat • Image Processing and Feature Extraction: • Matlab Image Processing Toolbox • OpenCV Image Processing Library • Gesture-Speech Correlation Analysis • HTK HMM Toolbox • Torch Machine Learning Library The SIMILAR NoE Summer Workshop 2005
Work Plan • Timeline of the project: • Schedule of the lectures: The SIMILAR NoE Summer Workshop 2005
Team Members • Ferda Ofli • Koc University • Image, Video Processing and Feature Extraction • Yelena Yasinnik • Massachusetts Institute of Technology • Audio-Visual Correlation Analysis • Oya Aran • Bogazici University • Gesture Based Human-Computer Interaction Systems The SIMILAR NoE Summer Workshop 2005
Team Members • Alexey Anatolievich Karpov • Saint-Petersburg Institute for Informatics and Automation • Speech Based Human-Computer Interaction Systems • Stephen Wilson • University College Dublin • Audio-Visual Gesture Annotation • Alexander Refsum Jensenius • Department of Music, Oslo University • Gesture Analysis The SIMILAR NoE Summer Workshop 2005
References • Jie Yao and Jeremy R. Cooperstock, “Arm Gesture Detection in a Classroom Environment,” Proc. WACV’02 pp. 153-157, 2002. • Y. Azoz, L. Devi. R. Sharma, “Tracking Hand Dynamics in Unconstrained Environments,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp. 274-279, 1998. • S. Malassiotis, N. Aifanti, M.G. Strintzis, “A Gesture Recognition System Using 3D Data,” Proc. Int. Symposium on 3D Data Processing Visualization and Transmission’02 pp. 190-193,2002. • J-M. Chung, N. Ohnishi, “Cue Circles: Image Feature for Measuring 3-D Motion of Articulated Objects Using Sequential Image Pair,” Proc. Int. Conference on Automatic Face and Gesture Recognition’98 pp. 474-479, 1998. • S. Kettebekov, M. Yeasin, R. Sharma, “Prosody based co-analysis for continuous recognition of coverbal gestures,”Proc. ICMI’02 pp.161-166, 2002. • F. Quek, D. McNeill, R. Ansari, X-F. Ma, R. Bryll, S. Duncan, K.E. McCullough “Gesture cues for conversational interaction in monocular video,” Proc. Int. Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems’99 pp. 119-126, 1999. • For detailed information visit: http://htk.eng.cam.ac.uk • Rabiner, L.; Juang, B., “An introduction to hidden Markov models” ASSP Magazine, IEEE, Vol.3, Iss.1, pp. 4- 16, Jan 1986 • Jae-Moon Chung; Ohnishi, N., “Cue circles: image feature for measuring 3-D motion of articulated objects using sequential image pair” Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, Vol., Iss., pp. 474-479, 14-16 Apr 1998 • A. Just, O. Bernier, S. Marcel., “Recognition of isolated complex mono- and bi-manual 3D hand gestures” Proc.6.ICAFGR, 2004 The SIMILAR NoE Summer Workshop 2005