1 / 38

Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies

Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies. Friday, 06 February 2009. Towards Multimodal Evaluation of Speech Pathologies. Outline. Peaks – A system for the evaluation of pathologic speech

sherman
Download Presentation

Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elmar Nöth, Andreas Maier, Michael Stürmer, Maria SchusterTowards Multimodal Evaluation of Speech Pathologies Friday, 06 February 2009

  2. Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders  eye tracking & bio signals • Facial paralysis  facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary

  3. Towards Multimodal Evaluation of Speech Pathologies Cleft Lip and Palate • Structural malformations of • the nose • the throat • the mouth • the jaw • Negative effects on • the respiration • the nutrition • the hearing • the speaking • the psychosocial competence • Prevalence: 1 : 500-700

  4. Towards Multimodal Evaluation of Speech Pathologies Laryngectomees • Removal of the larynx due to cancer • Breathing is detoured through the tracheostoma

  5. Towards Multimodal Evaluation of Speech Pathologies Laryngectomees • Removal of the larynx due to cancer • Breathing is detoured through the tracheostoma • Speaking is enabled by a substitute voice

  6. Towards Multimodal Evaluation of Speech Pathologies Motivation • Problem: • There is no objective validated method to measure the intelligibility reliably • In clinical practice: subjective evaluation only • Solution: • Application of an automatic speech recognition system (ASR) to assess the intelligibility

  7. Towards Multimodal Evaluation of Speech Pathologies Approach • Recording of the speech data: • Client PC with unknownoperating system • Different Tests for differentpatients • Automatic analysis of the speechdata on a server system • A few minutes after the recording:An automatically generated reportis available

  8. Towards Multimodal Evaluation of Speech Pathologies Architecture client server audio- data recording feature- extraction MFCC secure transmission audio- data speech analysis speech recognition secure transmission speech features recognized word-chain report report scoring

  9. Towards Multimodal Evaluation of Speech Pathologies Subjective Evaluation • Evaluation of the audio data by speech experts • On a scale from 1 to 5 • For each turn • Averaging for each speaker leads to a continuous scale from 1 to 5

  10. Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (children)

  11. Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (adults)

  12. Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (general)

  13. Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders  eye tracking & bio signals • Facial paralysis  facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary

  14. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders

  15. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders

  16. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders

  17. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders

  18. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders

  19. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders

  20. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders

  21. Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders  eye tracking & bio signals • Facial paralysis  facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary

  22. 3D Camera: Principle

  23. Time-of-Flight (ToF) 3D Camera • up to 50 Hz • more than 25k 3D points (176*144 pixels) • eye-safe infrared light / no exposure

  24. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Facial Paresis

  25. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Facial Paresis

  26. Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Facial Paresis

  27. Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders  eye tracking & bio signals • Facial paralysis  facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary

  28. Towards Multimodal Evaluation of Speech Pathologies Real-Time Transmission for Telemedicine • In many cases complex disease pattern • Need for specially trained therapists • Reduced mobility of patient  Telemedical treatment  Real-time transmission of multimodal data

  29. Towards Multimodal Evaluation of Speech Pathologies Telemedicine • Secure transmission • Sufficient bandwidth • Video streaming with Open Source software FFmpeg (http://ffmpeg.org)

  30. Towards Multimodal Evaluation of Speech Pathologies MPEG: YUV - Coding Y U V

  31. Towards Multimodal Evaluation of Speech Pathologies MPEG: YUV - Coding Y Y: 8 bit / pixel U: 8 bit / 4 pixels V: 8 bit / 4 pixels YUV: 12 bit / pixel V U

  32. Towards Multimodal Evaluation of Speech Pathologies Video Information to be Transmitted • ≈ 15 frames/second • currently 25.000 pixels/frame (176*144), next version: 40.000 pixels/frame (204*204) • Per pixel: • amplitude currently ignored • depth encoded with 8 bit and transmitted • in the Y channel of YUV-coding • XYZ coordinates ignored, can be recovered from depth & camera parameters • U & V channels (4 bit/pixel) transmitted but currently ignored (can be used to transmit amplitude or to improve depth resolution) • 15*176*144*12 bit/second + audio ≈ 0,66 MByte/second

  33. Towards Multimodal Evaluation of Speech Pathologies Experimental Results • Speed: FFmpeg transmission of 3D video + audio (mp3) in real-time (15 frames, depth only, 44.1 kHz mono) < 50 kByte/second  can be done via standard DSL • Accuracy: • depends on range; here: minimum distance = 50 cm  range = maximum distance – 50  range quantizedwith 256 steps (limit to 8 Bit Y channel) • mpeg compression adds additional error  error measured after mpeg encoding/decoding • software based averaging over 5 frames

  34. Towards Multimodal Evaluation of Speech Pathologies Experimental Results

  35. Towards Multimodal Evaluation of Speech Pathologies Experimental Results Original Range: 35 cm, error: 1.6 mm Range: 50 cm, error: 2.2 mm Range: 90 cm, error: 3.7 mm

  36. Towards Multimodal Evaluation of Speech Pathologies Outlook • 3D image has low resolution but high resolution depth map • Registration of low resolution 3D with high resolution 2D  high quality videos for real-time telemedical therapy • Localization of eyes, mouth, etc. in 3D images is fast and less error prone than in 2D image  improved symmetry features for therapy diagnosis • Implementation of real-time prototypes for • Audio + 3D-TOF + 2D webcam • Audio + eye tracking + biosignals for telemedical and biofeedback therapy

  37. Towards Multimodal Evaluation of Speech Pathologies Summary • Peaks: A system for the evaluation of pathologic speech • Offline, audio only, tested on different pathologies • Examples, where multimodality is important • Emotional disorders  eye tracking & bio signals • Facial paralysis  facial expression in 3D • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Standard video streaming and information reduction  acceptable quality 3D images with standard DSL • Outlook: Registration of 3D with 2D image  high quality visualization, error free symmetry features

  38. Towards Multimodal Evaluation of Speech Pathologies Thank you for your kind attention Supported by Deutsche Forschungsgemeinschaft (DFG) Deutsche Krebshilfe (German Cancer Aid) noeth@informatik.uni-erlangen.de

More Related