1 / 57

Application : Surveillance in trains

Explore the challenges and advancements in visual speech recognition using lip-reading and facial expression analysis. Learn about the benefits of combined audio and visual signals for enhanced speech recognition accuracy.

Download Presentation

Application : Surveillance in trains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application : Surveillance in trains Video, Audio processing Sound localization, pattern rec.

  2. Model based approach Automatic recognition of facial expressions and lipreading using vector flow Lip reading Facial expression recognition

  3. What makes visual speech recognition so hard? • Visemes  Smaller word separability • Speech info in audio > Speech info in video

  4. Lip-reading by Humans • People recognize speech better when the signal is both auditory and visual • The difference inrecognition ratesgrows with thelevel of noise inthe environment

  5. Inspiration • In the 1968 Stanley Kubrick film 2001: A space odyssey the computer reads from the lip-movements the conversation of two astronauts. • Thirty years later automated lip-reading becomes a significant part of research in speech recognition systems.

  6. AV speech corpus New speech corpus

  7. Databases of different quality and resolution

  8. AV speech corpus Recording a new speech corpus Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations Visemes|Corpus|Tracking|Features

  9. AV speech corpus Recording a new speech corpus Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations Visemes|Corpus|Tracking|Features

  10. AV speech corpus New speech corpus • Dutch • Recorded at high-speed: 100 fps • Front and profile views included • 70 people • 49 male, 21 female • Students, professors, secretaries, friends • Utterances: • Sentences, digits, spelling, conversation starters/endings, open questions • Normal, fast, whispering Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations Visemes|Corpus|Tracking|Features

  11. AV speech corpus New speech corpus Applications|Problem|ASR|VSR|Training|Analysis|Conclusion|Recommendations Visemes|Corpus|Tracking|Features

  12. Lip-reading by Humans • People recognize speech better when the signal is both auditory and visual • The difference inrecognition ratesgrows with thelevel of noise inthe environment

  13. ISFER WorkbenchExamples (continued)

  14. Active Contours • Internal and external energies • Internal energy forces contour to shrink • Locally defined external energy forces the contour to stop at the edge of the mouth • Computationally cheap • Sensitivity to initial setting of the contour 13 13 13 11 10 13 12 9 8 8 11 10 8 7 7 6 10 9 7 6 5

  15. Template Matching • Internal and external energies • Internal energy forces template to maintain geometry • Globally defined external energy forces appropriate placement on the picture • Better results than with snakes • Integration of energy functions at each step can be very time consuming

  16. Model • Goal: lip-reading • Needed: • accurate description of visible parts of articulatory system • Accurate description of the shape of the mouth: • measurements of the distance of the lip to a center of the mouth • measurements of thickness of visible part of the lips

  17. Data processing (continued) • Filtered image • intensity distribution • center of mouth • Image in polar coordinates • Conditional distribution • Mean and variance functions

  18. Data visualization • Single frame data vector:

  19. Results of Experiments • Feed Forward BP Vanmiddag komt de pianostemmer langs om mijn vleugel te stemmen

  20. Face tracking Tracking the face – Optical flow • Capturing apparent motion of subsequent images in a grid of motion vectors • Advantages • No lip model required • Good at capturing motion • Disadvantage • Slow

  21. Face tracking Tracking the face – Lip Geometry Estimation • Applying some color filters and capturing the lip contours in polar coordinates • Advantages • No lip model required • More or less person-independent • Disadvantage • Not robust to external factors

  22. Face tracking Tracking the face – Active Appearance Models • Point tracking according to a statistical lip model • Disadvantage • Requires annotated training • images • Advantages • Robust against external • factors • Fast!

  23. Face tracking Active Appearance Models – Design of the lip model

  24. Face tracking AAM model point coordinates

  25. Feature extraction Features plotted for“F” time (frames)

  26. 5-states HMM

  27. Model based approach Automatic recognition of facial expressions using active Appearance model Automatic bi-modal human emotion recognition

  28. Face localization

  29. M.A.E.L.I.A. Our digital cat H.C.I. Group

  30. H.C.I. Group

  31. H.C.I. Group

  32. Requirements in other words… Get a life! I am still sleeping! AIBO! Let’s play!!! Follow me Are you out of your mind? I am sleeping!!! AIBO! Bring me my newspaper!!! 8:00 AM 7:00 AM I am so bored! I wish I had a companion! I feel so lonely!!! I am very sad and depressed. AIBO! Let’s play!!! Follow me Finally I have a friend! I am so happy and I even managed to pick up the bone! Wow!!! 11:00 AM 14:00 AM 16:00 AM

  33. Multimodal Communication Uh, what a nerd I want a date She looks nice Uh, …. I have no time to do anything with you Hello, do you like to chat with me ?

  34. Multi-modal interaction

  35. Would you like to join mefor a dinner ?

  36. Chat-session • A cup of tea? • Mmh, njeh, I don’t like tea. • What’s wrong with tea? • Tea makes me sick. • That’s nonsense!! • And my sister doesn’t like you too! • She is very disappointed!! • Hihi, I was joking!!! • Oh, that’s funny!!!

  37. Chat-session • (f) A cup of tea? : - ) • (m) Mmh, njeh, I don’t like tea. (: - ( • (f) What’s wrong with tea? : - o • (m) Tea makes me sick. % - \ • (f) That’s nonsense!! : - l l • (f) My sister doesn’t like you too! : - l l • (f) She is very disappointed!! : - ( • (m) Hihi, I was joking!!! ; - ) • (f) Oh, that’s funny!!! : - ]

  38. A cup of tea? : - )

  39. Mmh, njeh, I don’t like tea. (: - (

More Related