1 / 35

Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin

Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer Science Susan Paddock and Simon Rushton Cardiff School of Psychology Cardiff University. Context: A Talking Head.

galya
Download Presentation

Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Perceptually Realistic Talking Heads: Models, Methods and McGurk David Marshall, Darren Cosker and Paul Rosin Cardiff School of Computer Science Susan Paddock and Simon Rushton Cardiff School of Psychology Cardiff University

  2. Context: A Talking Head • Development of a Video-Realistic Talking Head • Animation from Continuous Speech • Perceptual Analysis -> Realism

  3. Contribution of this Paper: Perceptual Realism Test • Perceptual Analysis via McGurk Test • Perceptual Test with no prior bias • Used to improve talking head synthesis

  4. Outline of Talk • Video Realistic Talking Head (Overview) • Perceptual Analysis and Testing • The McGurk Effect + McGurk Test • Results : Implications of McGurk • Conclusions + Future Work

  5. Our Talking Head • Image based synthesis • Continuous Speech • Flexible framework – emotion, behaviour BASIC IDEA: • Train on input video and audio • Extracting only low level image and audio features • No phonetic labelling • Synthesise new video using only input audio • Unseen utterances • Speaker Independent

  6. Hierarchical Facial Model • Active Appearance Models – Control of shape and texture using single ‘appearance parameter’ • Based on Principal Component Analysis (PCA) • Non-linear Hierarchical PCA (developed at Cardiff) • Greater Separation of Variation • High Degree of Control – Sub-Facial variation not orthogonal in standard PCA model • Coupling of Speech Model (Cardiff Idea)

  7. Building A Talking Head - Initialisation • For Each Video Frame Extract: • Shape – Key Landmark Points (Tracker Helps) • Textures – Colour Pixel Values Normalised to Shape • Speech Features – Mel-Cepstral, Linear Predictive Coding (LPC)

  8. Building A Talking Head - Tracking • Semi Automated • Hand Place Few Frames • Build Interim Shape Model • Track Other Frames • Build Final Shape Model

  9. Building A Talking Head - Learning/Model Building Active Appearance Model (AAM)-> Shape (PCA) and Texture (PCA) Speech/Appearance Model (SAAM NEW) -> Speech (PCA) and AAM • Nonlinear PCA: • Gaussian Mixture Model (GMM) • Model of Dynamics: • Hidden Markov Model (HMM)

  10. Building A Talking Head - Synthesis + Reconstruction Input Speech -> Extract Speech Features + Find Best Clusters Bottom up reconstruction: Mouth Driven

  11. Talking Head Examples

  12. Talking Head Example:Independent Speaker

  13. Perceptual Analysis of Talking Heads Current Talking Head Analysis Methods • Subjective Evaluation • Analyse and Compare Trajectories • Improved Perception in Noisy environments • Forced Choice Testing

  14. Perceptual Analysis of Talking HeadsSubject and Trajectory Evaluation • Subjective Evaluation • Does it “look good”? • No formative comparison • No feedback to improve model • Analyse and Compare Trajectories • Ground truth quantitative assessment • Comparison to “seen” data • No perceptual quality measurement

  15. Perceptual Analysis of Talking Heads:Noisy Environment Evaluation • Noisy Environment Evaluation • Perceptual Evaluation • Compare Performance of Synthetic v Real Talking Head in realistic situations • Good overall test of talking head • Lip-syncing, realism • No Quantitative Measure of Performance

  16. Perceptual Analysis of Talking Heads:Forced Choice Testing Forced Choice Testing: • Users Asked if Video is Real or Synthetic • Only says if it looks realistic + lip sync is good • Big Prior Introduced • Users look for artefacts • Randomness Bias in User selection • Bored/Uninterested User • No Quantitative Feedback for Model Improvement • What makes it real/synthetic?

  17. Perceptual Analysis of Talking Heads:An New McGurk Test • McGurk Test for Perceptual Analysis • Subject doesn’t develop a prior • Helps address strengths and weaknesses • Suggests improvements based on these • Compliments other tests

  18. Perceptual Analysis of Talking Heads:The McGurk Effect MacDonald and McGurk (1976): • Auditory Syllable Dubbed onto Videotape of Different Syllables Gives Perception of and Entirely Different Syllable, e.g.: • Audio ‘Ba’ • Visual ‘Ga’ • Perception ‘Da’ • “Close Eyes – Illusion Vanishes” • Raises Psychological Audio-Visual questions: • How is Auditory and Visual Stimuli combined? • Why combine when audio is enough?

  19. Perceptual Analysis of Talking Heads:Some More McGurk Effect Examples

  20. Perceptual Analysis of Talking HeadsMcGurk Effect Examples (REAL)

  21. Perceptual Analysis of Talking Heads:McGurk Effect Examples (ANSWERS) Tuple:Mat/Dead/Gnat Tuple:Bent/Vest/Vent

  22. Perceptual Analysis of Talking HeadsMcGurk Effect Examples (Synthetic)

  23. Perceptual Analysis of Talking Heads: McGurk Effect Examples (ANSWERS) Synthetic Examples Tuple: Fame/Face/Feign Tuple: Mat/Dead/Gnat

  24. Perceptual Analysis of Talking Heads:Our McGurk Test McGurk Perceptual Evaluation Test: • Mix Real and Synthetic tuples. • What word do you perceive? • Users asked to note anything differences • NO PRIORS as to real/synthetic forced choice • User only asked about they hear/perceive • Best Viewing resolution • Tested different resolutions (72x75, 36x289, 720x576 pixels)

  25. Perceptual Analysis of Talking Heads:Our McGurk Experimental Procedure • Mix of Real and Synthetic McGurk Examples • Real examples are a control • Users Presented with a series of 60 (30 real 30 Synthetic) random examples • Users asked only to focus on the mouth area • Two initial example “training” sequences (not in trial) • Soundproofed booths with adjustable volume and artificial lighting • Replay option for all example • Users simply record the word they perceive • Users asked three questions after viewing all clips • “Did you notice anything about the videos that you can comment on?” • “Could you tell that some of the videos were computer generated?” • “Did you use the replay button at all?” • 20 psychology undergrad test subjects (4 Male/16 female) with normal hearing/vision

  26. Perceptual Analysis of Talking Heads: How is Our McGurk Test a Test • How is this a test? • Correct Lip Synch = McGurk Effect • Incorrect Lip Synch = Audio/Other • Audio should be dominant • Questions Assess Behaviour/Output • After test procedure participants asked whether they noticed anything unnatural?

  27. Perceptual Analysis of Talking Heads:Results • Four Types of Analysis of Results: • Standard McGurk Response • From tuples form accepted audio and accepted McGurk response • Original McGurk observation • Enhanced McGurk Response • Assemble a List of All participants McGurk Reponses • Allows for greater variability in accents/articulation • Allows for greater analysis and Improvement of Head Models • Effects of Resolution on McGurk Effect • End of Test Questions Analysis • General overall response, qualitative analysis

  28. Perceptual Analysis of Talking Heads: Standard McGurk Response

  29. Perceptual Analysis of Talking Heads:Enhanced McGurk Response

  30. Perceptual Analysis of Talking Heads:Image Resolution

  31. Perceptual Analysis of Talking Heads:End of Test Questions Results • “Notice anything to comment on?” Some audio didn’t match video • “Could you tell some synthetic?” No, 1 participant = some unnatural? • “Did you use replay?” Few = once, One = twice

  32. Perceptual Analysis of Talking Heads:Overall Results Analysis • Realistic behaviour • Most users were unaware of synthetic output • More McGurk effects in real output • Points to some weakness in model • Good Synthesis of /F/, /D/, /S/, /A/ and /E/ • Poor Synthesis of /V/ • Some weak real and synthetic McGurk responses • Beige-Gaze-Deige -> 2X Audio v McGurk • Mock-Dock-Knock -> 50:50 Audio:McGurk • Resolution has effect on real only • Due to overall lower synthetic McGurk response

  33. Conclusions • Suggested a perceptual approach to analysis and development of a Talking Head • Unbiased by prior forced choice making • Insight into performance of algorithms • Complements other tests

  34. Perceptual Analysis of Talking Heads:Future Work • Talking Head • Full Emotion • Performance Driven Animation • 3D Modelling • Full 3D appearance modelling • Other perceptual tests • Longer videos – McGurk sentences • Real/Synthesised correct lip synch: McGurk = bad synch? • Emotion – A McGurk emotion test?

  35. Web Links • Paper Downloads www.cs.cf.ac.uk/user/D.P.Cosker/publications.html www.cs.cf.ac.uk/Dave/Publications.html • McGurk Video Clips and McGurk Test Software (Macromedia Director) www.cs.cf.ac.uk/user/D.P.Cosker/McGurk/

More Related