1 / 11

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective. Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston. Motivation. Avatars have been increasingly used in Human-Computer Interfaces

stew
Download Presentation

Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science University of Houston

  2. Motivation • Avatars have been increasingly used in Human-Computer Interfaces • Teleconferencing, computer-mediated communication, distance education, online virtual worlds, etc. • Human-like avatar gestures influence human perception significantly • Facial expressions • Hand gestures • Lip movements • head movements • One of the crucial visual cues to facilitate engaging social interaction and communication

  3. How do talking head movements affect perception?

  4. Our Quantitative Perspective Talking Avatar Head Animations • Uncover how talking avatar head movements affect human perception • User-rated head animations’ naturalness • Joint features extracted from head animations (with audio) • Acoustic speech features • Head motion patterns • Quantitatively analyze the association between extracted joint features and user ratings User evaluation Feature extraction Joint Features Perception (rating) Analysis of the association

  5. Data Acquisition and Processing • Acquisition of the audio-head motion dataset • Head & speech were recorded simultaneously • Head motion: optical motion capture system (120 Hz) • Speech: microphone (48 kHz) • Processing of the captured audio-head motion dataset • Head motion: 3 Euler rotation angles per frame • Speech: pitches and RMS energy • Aligned head & speech datasets to the same frame rate (24 FPS) Y-axis rotation X-axis rotation Z-axis rotation

  6. Subjective Evaluation • Using the captured dataset, we generated 60 head animation clips • Based on 15 recorded speech clips • 4 different audio-head motion generation techniques • Mosaic on the mouth region • User study • 18 participants • Ages: 23~28 • Gender: female (16.67%), male (83.33%) • Language: fluent English-speakers • User rating: 1~5

  7. Speech-Head Motion Features and Perception • Measure the correlation between head motion and speech features • Canonical Correlation Analysis (CCA) • Pitch-Head motion and human perception • Computed Pearson coefficient: 0.731 • Energy-Head motion and human perception • Seem random, definitely not linear.

  8. Speech-Head Motion Features and Perception • Implications for CHI • Validate the tight coordination between speech and head motion: Precise timing in generation is required • Delayed head movement generation may significantly degrade human perception • An approximate linear correlation between user ratings and CCA for Pitch-head motion • Prosody driven head motion synthesis could be fundamentally sound. • No a simple linear correlation between user ratings and CCA for RMS Energy-head motion • RMS energy may vary among sentences

  9. Frequency-Domain Analysis of Head Motion Y-axis • Frequency-domain analysis of head motion • Head motion: rotation angles • Frequency spectrum: FFT transform applied to the head rotation angle vector • Association between head motion spectrum and human perception • With squared magnitude less than 5 degree. Z-axis X-axis - X-axis: average user rating (2.1 ~ 4.2) - Y-axis: the squared magnitude of three Euler angles in the head rotation (0 ~ 5 degree) - Z-axis: Frequency spectrum (0 ~ 19 Hz)

  10. Frequency-Domain Analysis of Head Motion Low-frequency patterns • Key observations • Highly rated: low-frequency • Natural head motion: less than 10 Hz • Lowly rated: high-frequency • Typically lager than 12 Hz • With a small range of head movements • Implications for HCI • The comfortable head motion frequency zone: 0~12 Hz • Smooth post-processing for head motion generations of talking avatar • Smooth: Post-process the synthesized head motions • Simply crop the high frequency part from the synthesized head motions High-frequency patterns

  11. Conclusion and Future Work • Summary of our findings • The coupling between the pitch and head motion has a strong linear correlation with human perception • The perceived-natural head motions mainly consist of low-frequency motion components and those high-frequency components (>12 Hz) will damage human perception significantly. • Future work • Multi-party conversation scenario • Analysis of other fundamental speech features: pause, repetitions, etc. Acknowledgments: This work is in part supported by NSF IIS-0914965, Texas Norman Hackerman Advanced Research 003652-0058-2007, and research gifts from Google and Nokia.

More Related