Converting sound to images

Converting sound to images How Lipmovements can be adjusted to match speech M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Overview • Introduction to the problem • Feature extraction • Modeling • Results M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Motivation • Helping hearing impaired people to use telephones. • Automatic synchronization. • Low bandwidth transmission. • … M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Data She had your… From the VidTimit database (Conrad Sanderson) M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Inherent difficulties McGurk (Arnt Maasø) Ba + Ga = Da M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Feature extraction (sound) From the sound Mel Frequency Cepstral Coefficients can be extracted, these features are often used for speech recognition. I have used 13 parameters. The evoulution of the first 4 is shown in the figure. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Feature extraction (images) An Active Apearance Model is used to model the face. The face is described by 14 parameters. The movies show the effect of varying four of the parameters individually. www.imm.dtu.dk/~aam M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Feature extraction (images 2) After making the model the image fetures can be tracked. The 14 parameters can be extracted from the tracking result. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Linear state space model(Kalman filter) A A A x_1 x_2 x_3 x_4 B C s_1 s_2 s_3 s_4 i_1 i_2 i_3 i_4 X: Hidden space S: Sound I: Image M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Finding the hidden state A A x_1 x_2 x_3 x_4 B s_1 s_2 s_3 s_4 M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Predicting the image A A x_1 x_2 x_3 x_4 C i_1 i_2 i_3 i_4 M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

x B C s i Training the model (EM) • Initialize A,B,C and covariance 1) Use Kalman smoother to find optimal hidden state sequence (X1-T) 2) Find A,B,C and covariance given the state sequence M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Test results M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

x B s i Choosing the hidden space M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Over fitting (training sequence) M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Outlook • Nonlinear state space model. • Compare with hidden Markov model approach. • Larger training set. • Add emotions. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

Converting sound to images

Converting sound to images

Presentation Transcript

Converting Digital Images to Microfilm

Converting Clients to Customers

Images and Sound

Converting Digital Images to Microfilm

Converting To Linux

Converting Visitors to Buyers

Presenting text, sound, and images . . .

Converting to Ventus

Converting Conversations to Currency

CONVERTING DECIMALS TO FRACTIONS

Converting Video to wmv

Converting To

Converting VHS to DVD

Images, Hyperlinks, and Sound

Variables, Scope, Images, Sound

Elements of Multimedia Colors, Images and Sound

Converting to SVG

Converting To

Chapter 8 - Creating Images and Sound

Converting

Converting raster images to XML and SVG

Audio Images Sound & Lighting Inc