520 likes | 915 Views
Computer-generated Face Animation Methods. Morph targets/key frames (traditional)Speech articulation model (TTS)Facial Action Coding System (FACS)Physics-based (skin and muscle models)Marker-based (dots glued to face)Video-based (surface features). Morph targets/key frames. AdvantagesComplete manual control of each frameGood for exaggerated expressionsDisadvantagesHard to achieve good lipsync without manual tweekingMorph targets must be downloaded to terminal for streaming animation 24
E N D
1. Face Animation Overview with Shameless Bias Toward MPEG-4 Face Animation Tools
2. Computer-generated Face Animation Methods Morph targets/key frames (traditional)
Speech articulation model (TTS)
Facial Action Coding System (FACS)
Physics-based (skin and muscle models)
Marker-based (dots glued to face)
Video-based (surface features)
3. Morph targets/key frames Advantages
Complete manual control of each frame
Good for exaggerated expressions
Disadvantages
Hard to achieve good lipsync without manual tweeking
Morph targets must be downloaded to terminal for streaming animation (delay)
4. Speech articulation model Advantages
High level control of face
Enables TTS
Disadvantages
Robotic character
Hard to sync with real voice
5. Facial Action Coding System Advantages
Very high level control of face
Maps to morph targets
Explicit specification of emotional states
Disadvantages
Not good for speech
Not quantified
6. Physics-based Advantages
Good for realistic skin, muscle and fat
Collision detection
Disadvantages
High complexity
Must be driven by high level articulation parameters (TTS)
Hard to drive with motion capture data
7. Marker-based Advantages
Can provide accurate motion data from most of the face
Face models can be animated directly from surface feature point motion
Disadvantages
Dots glued to face
Dots must be manually registered
Not good for accurate inner lip contour or eyelid tracking
8. Video-based Advantages
Simple to capture video of face
Face models can be animated directly from surface feature motion
Disadvantages
Must have good view of face
9. What is MPEG-4 Multimedia? Natural audio and video objects
2D and 3D graphics (based on VRML)
Animation (virtual humans)
Synthetic speech and audio
10. Samples versus Objects Traditional video coding is sample based (blocks of pixels are compressed)
MPEG-4 provides visual object representation for better compression and new functionalities
Objects are rendered in the terminal after decoding object descriptors
11. Object-based Functionalities User can choose display of content layers
Individual objects (text, models) can be searched or stored for later used
Content is independent of display resolution
Content can be easily repurposed by provider for different networks and users
12. MPEG-4 Object Composition Objects are organized in a scene graph
Scene graphs are specified using a binary format called BIFS (based on VRML)
Both 2D and 3D objects, properties and transforms are specified in BIFS
BIFS allows objects to be transmitted once and instanced repeatedly in the scene after transformations
13. MPEG-4 Operation Sequence
15. Faces are Special Humans are hard-wired to respond to faces
The face is the primary communication interface
Human faces can be automatically analyzed and parameterized for a wide variety of applications
16. MPEG-4 Face and Body Animation Coding Face animation is in MPEG-4 version 1
Body animation is in MPEG-4 version 2
Face animation parameters displace feature points from neutral position
Body animation parameters are joint angles
Face and body animation parameter sequences are compressed to low bitrates
17. Neutral Face Definition
Head axes parallel to the world axes
Gaze is in direction of Z axis
Eyelids tangent to the iris
Pupil diameter is one third of iris diameter
Mouth is closed and the upper and lower teeth are touching
Tongue is flat, horizontal with the tip of tongue touching the boundary between upper and lower teeth
18. Face Feature Points
19. Face Animation Parameter Normalization Face Animation Parameters (FAPs) are normalized to facial dimensions
Each FAP is measured as a fraction of neutral face mouth width, mouth-nose distance, eye separation, or iris diameter
3 Head and 2 eyeball rotation FAPs are Euler angles
20. Neutral Face Dimensions for FAP Normalization
21. FAP Groups
22. Lip FAPs Mouth closed if sum of upper and lower lip FAPs = 0
23. Face Model Independence FAPs are always normalized for model independence
FAPs (and BAPs) can be used without MPEG-4 systems/BIFS
Private face models can be accurately animated with FAPs
Face models can be simple or complex depending on terminal resources
24. MPEG-4 BIFS Face Node Face node contains FAP node, Face scene graph, Face Definition Parameters (FDP), FIT,and FAT
FIT (Face Interpolation Table) specifies interpolation of FAPs in terminal
FAT (Face Animation Table) maps FAPs to Face model deformation
FDP information included face feature points positions and texture map
25. Face Model Download 3D graphical models (e.g. faces) can be downloaded to the terminal with MPEG-4
3D model specification is based on VRML
Face Animation Table( FAT) maps FAPs to face model vertex displacements
Appearance and animation of downloaded face models is exactly predictable
26. FAP Compression FAPs are adaptively quantized to desired quality level
Quantized FAPs are differentially coded
Adaptive arithmetic coding further reduces bitrate
Typical compressed FAP bitrate is less than 2 kilobits/second
27. FAP Predictive Coding
28. Face Analysis System MPEG-4 does not specify analysis systems
face2face face analysis system tracks nostrils for robust operation
Inner lip contour estimated using adaptive color thresholding and lip modeling
Eyelids, eyebrows and gaze direction
29. Nostril Tracking
30. Inner Lip Contour Estimation
31. FAP Estimation Algorithm Head scale is normalized based on neutral mouth (closed mouth) width
Head pitch is approximated based on vertical nostril deviation from neutral head position
Head roll is computed from smoothed eye or nostril orientation depending on availability
Inner lip FAPs are measured directly from the inner lips contour as deviations from the neutral lip position (closed mouth)
32. FAP Sequence Smoothing
33. MPEG-4 Visemes and Expressions A weighted combination of 2 visemes and 2 facial expressions for each frame
Decoder is free to interpret effect of visemes and expressions after FAPs are applied
Definitions of visemes and expressions using FAPs can also be downloaded
34. Visemes
35. Facial Expressions
36. Free Face Model Software Wireface is an openGL-based, MPEG-4 compliant face model
Good starting point for building high quality face models for web applications
Reads FAP file and raw audio file
Renders face and audio in real time
Wireface source is freely available
37. Body Animation Harmonized with VRML Hanim spec
Body Animation Parameters (BAPs) are humanoid skeleton joint Euler angles
Body Animation Table (BAT) can be downloaded to map BAPs to skin deformation
BAPs can be highly compressed for streaming
38. Body Animation Parameters (BAPs) 186 humanoid skeleton euler angles
110 free parameters for use with downloaded body surface mesh
Coded using same codecs as FAPs
Typical bitrates for coded BAPs is 5-10kbps
39. Body Definition Parameters (BDPs) Humanoid joint center positions
Names and hierarchy harmonized with VRML/Web3D H-Anim working group
Default positions in standard for broadcast applications
Download just BDPs to accurately animate unknown body model
40. Faces Enhance the User Experience Virtual call center agents
News readers (e.g. Ananova)
Story tellers for the child in all of us
eLearning
Program guide
Multilingual (same face different voice)
Entertainment animation
Multiplayer games
41. Visual Content for the Practical Internet Broadband deployment is happening slowly
DSL availability is limited and cable is shared
Talking heads need high frame-rate
Consumer graphics hardware is cheap and powerful
MPEG-4 SNHC/FBA tools are matched to available bandwidth and terminals
42. Visual Speech Processing FAPs can be used to improve speech recognition accuracy
Text-to-speech systems can use FAPs to animate face models
FAPs can be used in computer-human dialogue systems to communicate emotions, intentions and speech especially in noisy environments
43. Video-driven Face Animation Facial expressions, lip movements and head motion transferred to face model
FAPs extracted from talking head video with special computer vision system
No face markers or lipstick is required
Normal lighting is used
Communicates lip movements and facial expressions with visual anonymity
44. Automatic Face Animation Demonstration FAPs extracted from camcorder video
FAPs compressed to less than 2 kbits/sec
30 frames/sec animation generated automatically
Face models animated with bones rig or fixed deformable mesh (real-time)
46. What is easy, solved, or almost solved
Can we do photorealistic non-animated face models? YES
Can we do near-real-time lip sync'ing that is indistinguishable from a human? NO
47. What is really hard Synthesizing human speech and facial expressions
Hair
48. What we have assumed someone else is solving Graphics acceleration
Video camera cost and resolution
Multimedia communication infrastructure
49. Where we need help We have a face with 68 parameters but we need the psychologists to tell us how to drive it autonomously
We need to embody our agents into graphical models that have a couple of thousand parameters to control gaze, gesture, body language, and do collision detection-> NEED MORE SPEED
50. Core functionality of the face Speech
Lips, teeth, tongue
Emotional expressions
Gaze, eyebrow, eyelids, head pose
Non-verbal communication
Sensory responsivity
Technical requirements
Framerate
Synchronization
Latency
Bitrate
Spatial resolution
Complexity
Common framework withbody
Interaction
Different faces should respond similarly to common commands
Accessible to everyone
51. Interaction with other components Language and discourse
Phoneme to viseme mapping
Given/new
Action in the environment
Global information
Emotional state
Personality
Culture
World knowledge
Central time-base and timestamps
52. Open questions Central vs peripheral functionality
Degree of interface commonality
Degree of agent autonomy
What should the VH be capable of