380 likes | 525 Views
Elmar Nöth , Andreas Maier, Michael Stürmer, Maria Schuster Towards Multimodal Evaluation of Speech Pathologies. Friday, 06 February 2009. Towards Multimodal Evaluation of Speech Pathologies. Outline. Peaks – A system for the evaluation of pathologic speech
E N D
Elmar Nöth, Andreas Maier, Michael Stürmer, Maria SchusterTowards Multimodal Evaluation of Speech Pathologies Friday, 06 February 2009
Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders eye tracking & bio signals • Facial paralysis facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary
Towards Multimodal Evaluation of Speech Pathologies Cleft Lip and Palate • Structural malformations of • the nose • the throat • the mouth • the jaw • Negative effects on • the respiration • the nutrition • the hearing • the speaking • the psychosocial competence • Prevalence: 1 : 500-700
Towards Multimodal Evaluation of Speech Pathologies Laryngectomees • Removal of the larynx due to cancer • Breathing is detoured through the tracheostoma
Towards Multimodal Evaluation of Speech Pathologies Laryngectomees • Removal of the larynx due to cancer • Breathing is detoured through the tracheostoma • Speaking is enabled by a substitute voice
Towards Multimodal Evaluation of Speech Pathologies Motivation • Problem: • There is no objective validated method to measure the intelligibility reliably • In clinical practice: subjective evaluation only • Solution: • Application of an automatic speech recognition system (ASR) to assess the intelligibility
Towards Multimodal Evaluation of Speech Pathologies Approach • Recording of the speech data: • Client PC with unknownoperating system • Different Tests for differentpatients • Automatic analysis of the speechdata on a server system • A few minutes after the recording:An automatically generated reportis available
Towards Multimodal Evaluation of Speech Pathologies Architecture client server audio- data recording feature- extraction MFCC secure transmission audio- data speech analysis speech recognition secure transmission speech features recognized word-chain report report scoring
Towards Multimodal Evaluation of Speech Pathologies Subjective Evaluation • Evaluation of the audio data by speech experts • On a scale from 1 to 5 • For each turn • Averaging for each speaker leads to a continuous scale from 1 to 5
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (children)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (adults)
Towards Multimodal Evaluation of Speech Pathologies Speech Intelligibility (general)
Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders eye tracking & bio signals • Facial paralysis facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Emotional disorders
Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders eye tracking & bio signals • Facial paralysis facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary
Time-of-Flight (ToF) 3D Camera • up to 50 Hz • more than 25k 3D points (176*144 pixels) • eye-safe infrared light / no exposure
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Facial Paresis
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Facial Paresis
Towards Multimodal Evaluation of Speech Pathologies Need for Multimodality: Facial Paresis
Towards Multimodal Evaluation of Speech Pathologies Outline • Peaks – A system for the evaluation of pathologic speech • Examples, where multimodality is important • Emotional disorders eye tracking & bio signals • Facial paralysis facial expression • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Outlook & summary
Towards Multimodal Evaluation of Speech Pathologies Real-Time Transmission for Telemedicine • In many cases complex disease pattern • Need for specially trained therapists • Reduced mobility of patient Telemedical treatment Real-time transmission of multimodal data
Towards Multimodal Evaluation of Speech Pathologies Telemedicine • Secure transmission • Sufficient bandwidth • Video streaming with Open Source software FFmpeg (http://ffmpeg.org)
Towards Multimodal Evaluation of Speech Pathologies MPEG: YUV - Coding Y U V
Towards Multimodal Evaluation of Speech Pathologies MPEG: YUV - Coding Y Y: 8 bit / pixel U: 8 bit / 4 pixels V: 8 bit / 4 pixels YUV: 12 bit / pixel V U
Towards Multimodal Evaluation of Speech Pathologies Video Information to be Transmitted • ≈ 15 frames/second • currently 25.000 pixels/frame (176*144), next version: 40.000 pixels/frame (204*204) • Per pixel: • amplitude currently ignored • depth encoded with 8 bit and transmitted • in the Y channel of YUV-coding • XYZ coordinates ignored, can be recovered from depth & camera parameters • U & V channels (4 bit/pixel) transmitted but currently ignored (can be used to transmit amplitude or to improve depth resolution) • 15*176*144*12 bit/second + audio ≈ 0,66 MByte/second
Towards Multimodal Evaluation of Speech Pathologies Experimental Results • Speed: FFmpeg transmission of 3D video + audio (mp3) in real-time (15 frames, depth only, 44.1 kHz mono) < 50 kByte/second can be done via standard DSL • Accuracy: • depends on range; here: minimum distance = 50 cm range = maximum distance – 50 range quantizedwith 256 steps (limit to 8 Bit Y channel) • mpeg compression adds additional error error measured after mpeg encoding/decoding • software based averaging over 5 frames
Towards Multimodal Evaluation of Speech Pathologies Experimental Results
Towards Multimodal Evaluation of Speech Pathologies Experimental Results Original Range: 35 cm, error: 1.6 mm Range: 50 cm, error: 2.2 mm Range: 90 cm, error: 3.7 mm
Towards Multimodal Evaluation of Speech Pathologies Outlook • 3D image has low resolution but high resolution depth map • Registration of low resolution 3D with high resolution 2D high quality videos for real-time telemedical therapy • Localization of eyes, mouth, etc. in 3D images is fast and less error prone than in 2D image improved symmetry features for therapy diagnosis • Implementation of real-time prototypes for • Audio + 3D-TOF + 2D webcam • Audio + eye tracking + biosignals for telemedical and biofeedback therapy
Towards Multimodal Evaluation of Speech Pathologies Summary • Peaks: A system for the evaluation of pathologic speech • Offline, audio only, tested on different pathologies • Examples, where multimodality is important • Emotional disorders eye tracking & bio signals • Facial paralysis facial expression in 3D • 3D information using Time-of-Flight camera • Real-time transmission of multimodal data • Standard video streaming and information reduction acceptable quality 3D images with standard DSL • Outlook: Registration of 3D with 2D image high quality visualization, error free symmetry features
Towards Multimodal Evaluation of Speech Pathologies Thank you for your kind attention Supported by Deutsche Forschungsgemeinschaft (DFG) Deutsche Krebshilfe (German Cancer Aid) noeth@informatik.uni-erlangen.de