Creating a Speech Enabled Avatar from a Single Photograph

Creating a Speech Enabled Avatar from a Single Photograph Dmitri Bitouk Shree K. Nayar Columbia University

Speech Enabled Avatar Input photograph

Speech Enabled Avatar Input photograph Avatar

Speech Enabled Avatar Input photograph Avatar • Applications: • mobile messaging and video conferencing • news reporting and information kiosks • novel user interfaces

Facial Motion Synthesis Challenges • Mapping phonemes to static mouth shapes produces unrealistic, jerky animations • Co-articulation: facial articulations can be dominated the preceding as well upcoming phonemes • Asynchrony: facial motion may precede the corresponding sound

Related Work • Avatars from video sequences Bregler et al 1997, Ezzat et al 2002, etc • 2D Avatars from photographs Blanz et al 2003, CrazyTalkTM , MotionPortraitTM

Generic Facial Motion Model Prototype Surface Deformed Surface - Facial motion parameters Bitouk 2006

Generic Facial Motion Model

Facial Motion Transfer Prototype Face Novel Faces Bitouk 2006

s2 s1 Hidden Markov Models Phonemes: /B/, /K/, /AA/, /IY/, etc With lexical: /B/, /K/, /AA0/, /AA1/, /IY0/, /IY1/, etc stress Triphones: Facial motion parameters

Training Hidden Markov Models • Training set consists of motion capture data • Baum-Welch embedded re-estimation • Cluster triphone states to predict triphones not seen in the training set

Text-to-Speech Engine Hidden Markov Models Speech Text Facial Motion Parameters Facial Motion Synthesis from Text Time-labeled phonemes

Fitting the Prototype Model to an Image 2D Prototype Face Photograph

Facial Motion Synthesis

Eye Motion Synthesis

Eyeball Texture Synthesis Eye Image Synthesized Eyeball Texture

Eye Motion Synthesis Eye Motion Geometry

Eye Motion and Blinking

Visual Text-to-Speech Synthesis

Speech Recognition Hidden Markov Models Speech Facial Motion Parameters Facial Motion Synthesis from Speech Time-labeled phonemes

Facial Motion Synthesis from Speech

3D Avatars Captured Stereo Image Mirror View Direct View Gluckman & Nayar, 2001

3D Avatars Rectified Images 3D Model Mirror View Direct View

3D Avatars Point cloud engraved inside a glass cube Digital projector Nayar & Anand, 2007

3D Avatars

Limitations and Future Work • Automatic facial feature detection • Synthesis of rigid head motion • Expressive speech • Web demo of our system will be available in early April www.cs.columbia.edu/CAVE/

The End

Creating a Speech Enabled Avatar from a Single Photograph

Creating a Speech Enabled Avatar from a Single Photograph

Presentation Transcript

What is a Photograph

Creating a Poster from a PowerPoint Presentation

Creating a lesson from a reading

A Photograph as Art

Creating an Enabled Garden

Reading a Photograph

Creating a Persuasive Speech

Removing Camera Shake from a Single Photograph

Diffraction from a Single Slit

CREATING A DATABASE FROM A TO Z

Creating Network-Enabled Applications

Embodied Speech and Facial Expression Avatar

Chapter 20 Creating a Single Observation from Multiple Records

Chapter 21 Creating Multiple Observations from a Single Record

A Photograph

Taking a Great Photograph

Creating a Successful Spanish Speech System

Responding to a Photograph

Composition of a Photograph

Composition of a Photograph

3D Modeling from a photograph