300 likes | 505 Views
Creating a Speech Enabled Avatar from a Single Photograph. Dmitri Bitouk Shree K. Nayar. Columbia University. Speech Enabled Avatar. Input photograph. Speech Enabled Avatar. Input photograph. Avatar. Speech Enabled Avatar. Input photograph. Avatar. Applications :
E N D
Creating a Speech Enabled Avatar from a Single Photograph Dmitri Bitouk Shree K. Nayar Columbia University
Speech Enabled Avatar Input photograph
Speech Enabled Avatar Input photograph Avatar
Speech Enabled Avatar Input photograph Avatar • Applications: • mobile messaging and video conferencing • news reporting and information kiosks • novel user interfaces
Facial Motion Synthesis Challenges • Mapping phonemes to static mouth shapes produces unrealistic, jerky animations • Co-articulation: facial articulations can be dominated the preceding as well upcoming phonemes • Asynchrony: facial motion may precede the corresponding sound
Related Work • Avatars from video sequences Bregler et al 1997, Ezzat et al 2002, etc • 2D Avatars from photographs Blanz et al 2003, CrazyTalkTM , MotionPortraitTM
Generic Facial Motion Model Prototype Surface Deformed Surface - Facial motion parameters Bitouk 2006
Facial Motion Transfer Prototype Face Novel Faces Bitouk 2006
Facial Motion Transfer Prototype Face Novel Faces Bitouk 2006
s2 s1 Hidden Markov Models Phonemes: /B/, /K/, /AA/, /IY/, etc With lexical: /B/, /K/, /AA0/, /AA1/, /IY0/, /IY1/, etc stress Triphones: Facial motion parameters
Training Hidden Markov Models • Training set consists of motion capture data • Baum-Welch embedded re-estimation • Cluster triphone states to predict triphones not seen in the training set
Text-to-Speech Engine Hidden Markov Models Speech Text Facial Motion Parameters Facial Motion Synthesis from Text Time-labeled phonemes
Fitting the Prototype Model to an Image 2D Prototype Face Photograph
Fitting the Prototype Model to an Image 2D Prototype Face Photograph
Eyeball Texture Synthesis Eye Image Synthesized Eyeball Texture
Eye Motion Synthesis Eye Motion Geometry
Speech Recognition Hidden Markov Models Speech Facial Motion Parameters Facial Motion Synthesis from Speech Time-labeled phonemes
3D Avatars Captured Stereo Image Mirror View Direct View Gluckman & Nayar, 2001
3D Avatars Rectified Images 3D Model Mirror View Direct View
3D Avatars Point cloud engraved inside a glass cube Digital projector Nayar & Anand, 2007
Limitations and Future Work • Automatic facial feature detection • Synthesis of rigid head motion • Expressive speech • Web demo of our system will be available in early April www.cs.columbia.edu/CAVE/