Coding of Videophone Sequences using an Anatomical Model of a Human Person

Coding of Videophone Sequences using an Anatomical Model of a Human Person Markus Kampmann

Overview • Introduction • Adaptation of a 3D face model • Generation of a 3D wireframe of head and shoulders • Estimation of 3D motion of head and shoulders • Analysis and synthesis of facial expressions • Parameter coding • Experimental results • Summary

Introduction • Videophone sequence: 12 Mbit/s (CIF, 10 Hz frame rate, PCM) • Transmission channel: 8 - 128 kbit/s (ISDN,mobile) => Video coding necessary

Introduction: Model-based coding

Introduction Problems: • Adaptation of a 3D face model • Generation of a 3D wireframe of head and shoulders • Estimation of 3D motion of head and shoulders • Analysis and synthesis of facial expressions • Parameter coding

Adaptation of a 3D face model Two steps: 1. Estimation of 2D facial features in the image plane 2. Adaptation of the 3D face model using the estimated facial features

Adaptation of a 3D face model

Chin/cheek contour Parametric model of contours 8 unknown parameters MAP estimator probability of the occurrence of contours at a certain position conditional probability between contour position and image gradient Adaptation of a 3D facemodel

Eyebrows Original image Segmentation eyebrows darker than surrounding skin Seperation between hair and eyebrows Adaptation of a 3D facemodel

Nose features Nostrils darker than surrounding skin typical shape Sides of the nose typical shape image gradient Adaptation of a 3D facemodel

Adaptation of a 3D face model • Adaptation of size, position, shape and initial mimic

Generation of a 3D wireframe of head/shoulders

Estimation of 3D motion of head/shoulders

Estimation of 3D motion of head/shoulders Three steps: 1. Estimation of rotation and translation parameters of the shoulders (6 parameters) 2. Compensation of shoulders and head motion using the estimated motion parameters of the shoulders 3. Estimation of head rotation around the neck joint (3 parameters)

Synthesis of facial expressions • Each muscle: 1 parameter describing contraction

Synthesis of facial expressions • Additional mimic parameters: • jaw rotation • rotation of eyelids • translation of iris

Analysis of facial expressions • 27 mimic parameter • Maximum likelihood estimator • measured value: temporal luminance difference at observation points • conditional probability between measured value and the mimic parameters (motion parameters) • Multistage estimation

Parameter coding => 10 Hz frame frate: 6 kbit/s Motion, mimic PCM 200 bit/frame Polygon/spline approximation 200 bit/frame 2D person silhouette Uncovered background DCT 200 bit/frame

Experimental results

Experimental results 3D wireframe over original sequence

Experimental results 3D face model over original sequence

Experimental results 3D wireframe in side-view

Experimental results 3D wireframe over 3D face model over original sequence original sequence

Experimental results block-based, 22 kbit/s model-based, 6 kbit/s

Experimental results original, 12 Mbit/s model-based, 6 kbit/s Compression ratio: 2000 : 1

Summary • Video coder based on an anatomical model of a human person • Coding of videophone sequences (CIF, 10 Hz) at 6 kbit/s • Algorithms not restricted to video coding

Coding of Videophone Sequences using an Anatomical Model of a Human Person