160 likes | 246 Views
Converting sound to images. How Lipmovements can be adjusted to match speech. Overview. Introduction to the problem Feature extraction Modeling Results. Motivation. Helping hearing impaired people to use telephones. Automatic synchronization. Low bandwidth transmission. …. Data.
E N D
Converting sound to images How Lipmovements can be adjusted to match speech M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Overview • Introduction to the problem • Feature extraction • Modeling • Results M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Motivation • Helping hearing impaired people to use telephones. • Automatic synchronization. • Low bandwidth transmission. • … M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Data She had your… From the VidTimit database (Conrad Sanderson) M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Inherent difficulties McGurk (Arnt Maasø) Ba + Ga = Da M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Feature extraction (sound) From the sound Mel Frequency Cepstral Coefficients can be extracted, these features are often used for speech recognition. I have used 13 parameters. The evoulution of the first 4 is shown in the figure. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Feature extraction (images) An Active Apearance Model is used to model the face. The face is described by 14 parameters. The movies show the effect of varying four of the parameters individually. www.imm.dtu.dk/~aam M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Feature extraction (images 2) After making the model the image fetures can be tracked. The 14 parameters can be extracted from the tracking result. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Linear state space model(Kalman filter) A A A x_1 x_2 x_3 x_4 B C s_1 s_2 s_3 s_4 i_1 i_2 i_3 i_4 X: Hidden space S: Sound I: Image M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Finding the hidden state A A x_1 x_2 x_3 x_4 B s_1 s_2 s_3 s_4 M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Predicting the image A A x_1 x_2 x_3 x_4 C i_1 i_2 i_3 i_4 M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
x B C s i Training the model (EM) • Initialize A,B,C and covariance 1) Use Kalman smoother to find optimal hidden state sequence (X1-T) 2) Find A,B,C and covariance given the state sequence M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Test results M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
x B s i Choosing the hidden space M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Over fitting (training sequence) M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls
Outlook • Nonlinear state space model. • Compare with hidden Markov model approach. • Larger training set. • Add emotions. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls