Creating Photo-Realistic Talking Faces: Techniques and Analysis

Create Photo-Realistic Talking Face Changbo Hu 2001.11.26 *This work was done during visiting Microsoft Research China with Baining Guo and Bo Zhang

Outline • Introduction of talking face • Motivations • System overview • Techniques • Conclusions

Introduction • What is a talking face • Face (lip) animation, driven by voice • Applications • The process of talking face • Face model • Motion capture • Mapping between audio and video • Rendering, Photo-realistic?

Literatures • Walter,93, DecFace, 2Dwire frame model • Terzopoulos,95, Skin and muscle model • Breglar,97, Video Rewrite, Sample image based • TS Huang,98,Mesh model from range data • Poggio,98, MikeTalk, Viseme morphing • Guenter,99, Making face, 3D from multicamera • Zhengyou Zhang, 00, 3D face modeling from video through epipolar constraint • Cosatto,00, Planar quads model

Some Face models

Motivations • Aim: a graphics interface for conversation agent • Photo-realistic • Driven by Chinese • Smooth connection between sentences • Extended from “Video rewrite”

System overview:Pipeline of the system(1)

System overview: Pipeline of the system(2) New text TTS system Wav sound Segmentation Triphone sequence Train database Synthesized triphone sequence Background sequence Lip motion sequence Rewrite to faces

Techniques • Analysis: • Audio process • Image process • Synthesis • Lip image • Background image • Stitch together

Audio part:Sound Segmentation • Given the wav file and the script • Using HMM to train the segment system • Segment wav file to phoneme sequence • Example of the segmentation result: SILOPEN 0 23 SILOPEN 24 42 s 43 61 if4 62 74 j 75 80 ia1 81 97 sh 98 109 ang1 110 121 y 122 130 e4 131 133 y 134 145 in2 146 154 h 155 164 ang2 165 194

Annotation with Phoneme • Using phoneme to annotate video frames • Each phoneme in a sentence corresponds to a short time of video sequence

Phoneme Distance Analysis • Phoneme&triphone basics • Chinese Phoneme vs. English Phoneme • Distance Metrics definitions • Results

Phoneme Basics • Phonemes represents the basic elements in speech. All possible speech can be represented by combination of phonemes. CH, JH, S, EH, EY, OY, AE, SIL… • Triphone are three consecutive phonemes. It not only represents pronounce characteristics but also contains context information. T-IY-P, IY-P-AA, P-AA-T…

Chinese Phoneme vs. English • Chinese phoneme has two basic groups: Initials and Finals. Initials: B, P, M, F, … Finals: a3, o1, e2, eng3, iang4, ue5, … • Chinese finals each has 5 tones: 1,2,3,4,5. Different tones: a1, a2, a3, a4, a5. • Chinese finals actually is not a basic elements of speech. For example: iang1, iao1, uang1, iong1… • Chinese phoneme set is much larger than English.

Phoneme Distance Analysis • Define the distance between any two phonemes. • Since we only synthesis video but not sound, so tone is ignored • Lip shape motion is the core element for distance metrics.

Phoneme Distance Analysis Phoneme 1: Video 1 Video 2 Video 3 Video 4 Video 1 Video 2 Video 3 Video 4 Video Average Time Align to an uniform length Average the videos to get an average video Phoneme 2: Video 1 Video 2 Video 1 Video 2 Video Average By comparing the two aligned average videos, we generate the distance matrix of the whole phoneme set.

Image part: Pose Tracking • Assume a plane model for face • Standard minimization method to find transform matrix (affine transform)[Black,95] • Mask is used to constrain interests part of the face Template Picture Mask Image

Pose tracking • Motion prediction using parameters with physical meaning

Pose Tracking Some tracking results:

Lip Motion Tracking • Using Eigen Points (Covell, 91) • Feature Points include Jaw, lip and teeth • Training database specified manually • Auto tracking through all pose-tracked images

Lip motion tracking

Lip MotionTracking Train Database (hand-labeled) Auto Tracking Results

Synthesis new sentences • New text converted by TTS system to wav • Wav is segmented to phoneme sequence • Using DP to find an optimal video sequence from the training database • Time-align triphone videos and stitch them together. • Transform the lip sequence and paste them to background faces.

Lip sequence synthesis New phoneme sequences Optimal phoneme sequences New phoneme sequences Triphone 1 Triphone 4 Triphone 7 Triphone A Triphone 2 Triphone 5 Triphone 8 Triphone B Triphone 3 Triphone 6 Triphone 9 Triphone C

Dynamic Programming Begin End Triphone1 Triphone2 Triphone3 Triphone4 Triphone5

Edge Cost Definition • Two parts: • phoneme distance: 3 phonemes’ distances added together • Lip shape distance for the overlap portion of triphone video • Weighted add together two part

Background video generation • Background is a video sequence when the virtual character spoke something else • Similarity measurement of background • Select “standard frame” • The frame with maximal number of frames similar to it • Filter out the frames with jerkiness

Stitch the time-aligned result to background faces • Write back with a mask • Transform the synthesized lip to the background face

Mask image for write-back operation Original background frame Write-back result of the same frame

More video results

Conclusion and Future Work • Pose tracking and lip motion tracking • Size of the train database • Talking face with expression • Real-time generation? • Fast modeling for different person

Animation

Thank you

Creating Photo-Realistic Talking Faces: Techniques and Analysis

Creating Photo-Realistic Talking Faces: Techniques and Analysis

Presentation Transcript

Create Photo-Realistic Talking Face

Non-Realistic Rendering Photo Laborer

Create a Photo Album

Talking Face to Face

Interactive photo-realistic 3D digital prototyping

Talking Face to Face

Talking face to face

New Practical English Unit 2 Section A Talking Face to Face

Talking Face to Face

Talking Face to Face

Add a realistic Rainbow to a photo in Photoshop*

Talking Face to Face

Talking Face to Face

Understanding Photo-Realistic Signs and Lock out Tag out

Create Photo Slideshow with Photo Slideshow Software

Panoramic Photo Printing: Create an Impact

Create a Realistic Image - New Dimensions Photography

Create Photo-realistic Effects of Your Dream Home with 3D Architectural

Create Realistic Water in 3ds Max

Photo realistic 3D Architectural Rendering

Create Web Photo Gallery - Photos2webgallery.com

Create Awesome Photo Collage Online