1 / 10

Audiovisual Speech Analysis

Audiovisual Speech Analysis. Ouisper Project - Silent Speech Interface. Ouisper 1 - Silent Speech Interface. Sensor-based system allowing speech communication via standard articulators, but without glottal activity Two distinct types of application

Download Presentation

Audiovisual Speech Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audiovisual Speech Analysis Ouisper Project - Silent Speech Interface

  2. Ouisper1 - Silent Speech Interface • Sensor-based system allowing speech communication via standard articulators, but without glottal activity • Two distinct types of application • alternative to tracheo-oesophagal speech (TES) for persons having undergone a tracheotomy • a "silent telephone" for use in situations where quiet must be maintained, or for communication in very noisy environments • Speech Synthesis from ultrasound and optical imagery of the tongue and lips 1) Oral Ultrasound synthetIc SPEech souRce

  3. Ultrasound video of the vocal tract Optical video of the speaker lips Recorded audio Text TRAINING Visual Feature Extraction Speech Alignment Audio-Visual Speech Corpus Visual Data TEST N-best Phonetic or ALISP Targets Visual Speech Recognizer Visual Unit Selection Audio Unit Concatenation Ouisper - System Overview

  4. Ouisper - Training Data

  5. Ouisper - Video Stream Coding Build a subset of typical frames Perform PCA Eigenvectors Code new frames with their projections onto the set of Eigenvectors T.Hueber , G. Aversano, G.Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, M. Stone, “EigenTongue Feature Extraction For An Ultrasound-based Silent Speech Interface,” IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu Hawaii, USA, 2007.

  6. Ouisper - Audio Stream Coding Corpus-based synthesis Need of a preliminary segmental description of the signal ALISP Segmentation Detection of quasi-stationary parts in the parametric representation of speech Assignment of segments to class using unsupervised classification techniques Phonetic Segmentation Forced-alignement of speech with the text Need of a relevant and correct phonetic transcription of the uttered signal

  7. Audiovisual dictionary building • Visual and acoustic data are synchronously recorded • Audio segmentation is used to bootstrap visual speech recognizer Audiovisual dictionary

  8. Visuo-acoustic decoding • Visual speech recognition • Train HMM model for each visual class • Use multistream-based learning techniques • Perform a « visuo-phonetic » decoding step • Use N-Best list • Introduce linguistic constraints • Language model • Dictionary • Multigrams • Corpus-based speech synthesis • Combine probabilistic and data-driven approach in the audiovisual unit selection step.

  9. Speech recognition from video-only data Open your book to the first page ow p ax n y uh r b uh k t uw dh ax f er s t p ey jh Ref Rec ax w ih y uh r b uh k sh uw dh ax v er s p ey jh A wear your book shoe the verse page Corpus-based synthesis driven by predicted phonetic lattice is currently under study

  10. Ouisper - Conclusion • More information on • http://www.neurones.espci.fr/ouisper/ • Contacts • gerard.chollet@enst.fr • denby@ieee.org • hueber@ieee.org

More Related