110 likes | 239 Views
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective. Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston. Motivation. Avatars have been increasingly used in Human-Computer Interfaces
E N D
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston
Motivation • Avatars have been increasingly used in Human-Computer Interfaces • Teleconferencing, computer-mediated communication, distance education, online virtual worlds, etc. • Human-like avatar gestures influence human perception significantly • Facial expressions • Hand gestures • Lip movements • head movements • One of the crucial visual cues to facilitate engaging social interaction and communication
Our Quantitative Perspective Talking Avatar Head Animations • Uncover how talking avatar head movements affect human perception • User-rated head animations’ naturalness • Joint features extracted from head animations (with audio) • Acoustic speech features • Head motion patterns • Quantitatively analyze the association between extracted joint features and user ratings User evaluation Feature extraction Joint Features Perception (rating) Analysis of the association
Data Acquisition and Processing • Acquisition of the audio-head motion dataset • Head & speech were recorded simultaneously • Head motion: optical motion capture system (120 Hz) • Speech: microphone (48 kHz) • Processing of the captured audio-head motion dataset • Head motion: 3 Euler rotation angles per frame • Speech: pitches and RMS energy • Aligned head & speech datasets to the same frame rate (24 FPS) Y-axis rotation X-axis rotation Z-axis rotation
Subjective Evaluation • Using the captured dataset, we generated 60 head animation clips • Based on 15 recorded speech clips • 4 different audio-head motion generation techniques • Mosaic on the mouth region • User study • 18 participants • Ages: 23~28 • Gender: female (16.67%), male (83.33%) • Language: fluent English-speakers • User rating: 1~5
Speech-Head Motion Features and Perception • Measure the correlation between head motion and speech features • Canonical Correlation Analysis (CCA) • Pitch-Head motion and human perception • Computed Pearson coefficient: 0.731 • Energy-Head motion and human perception • Seem random, definitely not linear.
Speech-Head Motion Features and Perception • Implications for CHI • Validate the tight coordination between speech and head motion: Precise timing in generation is required • Delayed head movement generation may significantly degrade human perception • An approximate linear correlation between user ratings and CCA for Pitch-head motion • Prosody driven head motion synthesis could be fundamentally sound. • No a simple linear correlation between user ratings and CCA for RMS Energy-head motion • RMS energy may vary among sentences
Frequency-Domain Analysis of Head Motion Y-axis • Frequency-domain analysis of head motion • Head motion: rotation angles • Frequency spectrum: FFT transform applied to the head rotation angle vector • Association between head motion spectrum and human perception • With squared magnitude less than 5 degree. Z-axis X-axis - X-axis: average user rating (2.1 ~ 4.2) - Y-axis: the squared magnitude of three Euler angles in the head rotation (0 ~ 5 degree) - Z-axis: Frequency spectrum (0 ~ 19 Hz)
Frequency-Domain Analysis of Head Motion Low-frequency patterns • Key observations • Highly rated: low-frequency • Natural head motion: less than 10 Hz • Lowly rated: high-frequency • Typically lager than 12 Hz • With a small range of head movements • Implications for HCI • The comfortable head motion frequency zone: 0~12 Hz • Smooth post-processing for head motion generations of talking avatar • Smooth: Post-process the synthesized head motions • Simply crop the high frequency part from the synthesized head motions High-frequency patterns
Conclusion and Future Work • Summary of our findings • The coupling between the pitch and head motion has a strong linear correlation with human perception • The perceived-natural head motions mainly consist of low-frequency motion components and those high-frequency components (>12 Hz) will damage human perception significantly. • Future work • Multi-party conversation scenario • Analysis of other fundamental speech features: pause, repetitions, etc. Acknowledgments: This work is in part supported by NSF IIS-0914965, Texas Norman Hackerman Advanced Research 003652-0058-2007, and research gifts from Google and Nokia.