1 / 16

Converting sound to images

Converting sound to images. How Lipmovements can be adjusted to match speech. Overview. Introduction to the problem Feature extraction Modeling Results. Motivation. Helping hearing impaired people to use telephones. Automatic synchronization. Low bandwidth transmission. …. Data.

Download Presentation

Converting sound to images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Converting sound to images How Lipmovements can be adjusted to match speech M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  2. Overview • Introduction to the problem • Feature extraction • Modeling • Results M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  3. Motivation • Helping hearing impaired people to use telephones. • Automatic synchronization. • Low bandwidth transmission. • … M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  4. Data She had your… From the VidTimit database (Conrad Sanderson) M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  5. Inherent difficulties McGurk (Arnt Maasø) Ba + Ga = Da M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  6. Feature extraction (sound) From the sound Mel Frequency Cepstral Coefficients can be extracted, these features are often used for speech recognition. I have used 13 parameters. The evoulution of the first 4 is shown in the figure. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  7. Feature extraction (images) An Active Apearance Model is used to model the face. The face is described by 14 parameters. The movies show the effect of varying four of the parameters individually. www.imm.dtu.dk/~aam M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  8. Feature extraction (images 2) After making the model the image fetures can be tracked. The 14 parameters can be extracted from the tracking result. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  9. Linear state space model(Kalman filter) A A A x_1 x_2 x_3 x_4 B C s_1 s_2 s_3 s_4 i_1 i_2 i_3 i_4 X: Hidden space S: Sound I: Image M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  10. Finding the hidden state A A x_1 x_2 x_3 x_4 B s_1 s_2 s_3 s_4 M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  11. Predicting the image A A x_1 x_2 x_3 x_4 C i_1 i_2 i_3 i_4 M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  12. x B C s i Training the model (EM) • Initialize A,B,C and covariance 1) Use Kalman smoother to find optimal hidden state sequence (X1-T) 2) Find A,B,C and covariance given the state sequence M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  13. Test results M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  14. x B s i Choosing the hidden space M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  15. Over fitting (training sequence) M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

  16. Outlook • Nonlinear state space model. • Compare with hidden Markov model approach. • Larger training set. • Add emotions. M.Sc. PhD student Tue Lehn-Schiøler www.imm.dtu.dk/~tls

More Related