550 likes | 778 Views
Visible Speech Synthesis. Value of Talking Heads Enhance Intelligibility Enhance Realism and Naturalness Convey Paralanguage and Emotion State of the Art Issues Needs. Types of Synthesis. Physically Based Synthesis Terminal Analog Synthesis. Types of Synthesis.
E N D
Visible Speech Synthesis • Value of Talking Heads • Enhance Intelligibility • Enhance Realism and Naturalness • Convey Paralanguage and Emotion • State of the Art • Issues • Needs
Types of Synthesis • Physically Based Synthesis • Terminal Analog Synthesis
Types of Synthesis • Physically Based Synthesi • Articulatory Speech Synthesis • Muscle-Based Animation
Physically Based Synthesis • Articulatory Models of Speech Synthesis • Auditory Speech is Goal • Eventual Payoff for Animation • Muscle-Based Simulation • Computationally Intensive • Hard Linking EMG Measures to Synthesis
Physically Based Synthesis • Not Ready for Prime (Real) Time
Types of Synthesis • Terminal Analog Synthesis • Control Wire-Frame Model • Concatenate Stored Images
Terminal Analog Synthesis • Parameter Based Synthesis (Wire Frame) • Parke; Pearce et al • Cohen & Massaro • Le Goff & Benoit • Beskow
PSLTalk (Baldi) • Real Time on SGI & PC Platforms • Extension of Parke’s Wireframe Model • Has Tongue • Evolving Hard Palate and 3D Teeth • Many Control Parameters • Rotation, Translation, Interpolation • Phoneme Synthesis • Mechanism for Coarticulation
Control Parameters • Rotation of points • movement around axis, e.g., jaw rotation • Translation • movement of points, e.g., raise upper lip • Interpolation • Between two different subsections of wireframes--e.g., smile • Scaling • constant multiplier
Terminal Analog Synthesis • Concatenative Synthesis (Image)
Image Concatenation • Video Rewrite • Bregler, Covell, & Slaney • MikeTalk--MIT Optical Flow • Ezzat & Poggio • Triphone HMM synthesis • Brooke and Scott
Coarticulation • Parameter Synthesis • Dominance Function • Other Techniques? • Image Synthesis • Segment Size • Intermediate Frames
Paralinguistic (Language-Related) Synthesis • Segmental • Suprasegmental
Paralinguistic Synthesis(Segmental) • Nonspeech Segments • 18 Segments in Worldbet and OGIbet • Breadth Noise, Cough, Clear Throat, Laugh, Lip Smack, Sneeze, Tongue Click, Burp, Sniff, Squeak/Voice Crack, and Sigh
Paralinguistic Synthesis(Suprasegmental) • Head Movements (Referential) • Eye Movements • Eye Blinks • Eyebrow Raising with F0 • Eye Widening • Squinting
Synthesis of Emotion • Voice is Informative • Face is More Critical • Basic Universal Emotions • Happiness, Anger, Surprise, Fear, Disgust, and Sadness
Emotion Synthesis Issues • Control Parameters • Polygon Resolution • Interaction with Speech Parameters
Text to Speech/Animation (TtSA) • Output of Text Translation • Representation of Text
Requirements forOutput of Text Translation • Phonemes, duration, onset, offset • Stress • Provide Complete Sentence Transcription before Auditory Synthesis Begins
Alignment of Auditory andVisible Speech • Issues? • Perception of Asynchrony and Integration • Empirical Results • Theoretical Description • Auditory Phoneme vs. Visual Phoneme • Articulatory Synthesis
Representation of Paralinguistic and Emotion Information in Text • Embedded Text-Markup • SABLE--Starting Time, Intensity, and Dynamics of Emotions
I was <EMO SAD=“.8”> sad, but <EMO HAP=“0.9” SAD=“0.1”> now I’m happy again </EMO>.
Analysis of Visible Speech • Marked Skin Surfaces • Photogrammetric Measurement • Optotrak System • Unmarked Skin Surfaces • 3D Laser Scans of Static Poses
Analysis of Internal Structures of Visible Speech • Ultrasound • X-Ray Micro-Beam • Magnetic Resonance Imaging (MRI) • Cineradiography • Electropalatography (EPG)
Data Bases for Training • Model After Auditory Data Bases • Syllables, Words, Sentences • Bimodal Recording For Alignment • Bimodal TIMIT?
Multiple Facial Structures • General Control Parameters • Specifying Local Control • Gradient of Movement • Simplify and/or Modify Polygon Structure
Texture Mapping • Maps 2D Surface onto Wireframe • Multiple 2D Surfaces • 3D Cyberware Scan
Assessment of Quality • Intelligibility • Visible Speech • Syllables, Words, Sentences • Confusion Matrices (Viseme Structure) • Combine with Bimodal Speech • Attention, Memory and Robustness • Realism
Evaluating Speech Synthesis • Speechreading Syllables • Compare to Natural Speech
Evaluating Speech Synthesis • Word Recognition • Speech Reading • Confusion Matrices • Compare to Natural Speech
Evaluating Speech Synthesis • Sentence Processing • Auditory Alone in Noise • Bimodal • Look at Performance Gain
Evaluating Speech Synthesis • Natural Auditory Speech in Noise • Bimodal with Natural Face • versus • Synthetic Auditory Speech in Noise • Bimodal with Synthetic Face
Evaluating Speech Synthesis • Always Natural Auditory Speech in Noise • Bimodal with Natural Face • versus • Bimodal with Synthetic Face