1 / 16

Topics

Topics. Recognition results on Aurora noisy speech database Proposal of robust formant estimation from MFCCs Availability of real in-car speech databases Contact from Pi Research. Robust Formant Prediction from MFCCs.

tovi
Download Presentation

Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topics • Recognition results on Aurora noisy speech database • Proposal of robust formant estimation from MFCCs • Availability of real in-car speech databases • Contact from Pi Research b.milner@uea.ac.uk

  2. b.milner@uea.ac.uk

  3. Robust Formant Prediction from MFCCs • One of the aims of this integrated project is to use the speech recogniser to provide clean speech information for the speech enhancement component • Proposal is to use the speech recogniser to provide robust formant information from noisy speech • Review previous work on predicting pitch from MFCC vectors • Extension to proposed prediction of formants b.milner@uea.ac.uk

  4. Pitch Prediction from MFCCs • In speech recognition most common feature extracted is the mel-frequency cepstral coefficient (MFCC) • This is designed for class discrimination and contains spectral envelope information • Excitation information (pitch) is lost through smoothing processes • Project at UEA aimed at reconstructing speech from MFCC vectors - therefore needed additional pitch estimate or prediction of pitch b.milner@uea.ac.uk

  5. MFCC Extraction • Mel Frequency Cepstral Coefficients (MFCC) • designed for speech recognizer • simulate human perceptual ability • currently give best recognition performance • extract information of vocal tract • ignore most of speaker information, such as pitch speech Framing,Pre-emphasis and windowing FFT and Magnitude Spectrum Mel Filterbank Log( ) DCT Truncation 13-D MFCCs b.milner@uea.ac.uk

  6. Pitch Prediction from MFCC vectors • There is clearly no global correlation between pitch frequency and spectral envelope (or MFCC vector) • There does exist a class-dependent correlation - the classes being different speech sounds • If this class-based correlation can be modelled then prediction of pitch from spectral envelope, or MFCC, should be possible • Investigate two methods for modelling this correlation • GMM • HMM b.milner@uea.ac.uk

  7. Class-based GMM Pitch Prediction Training phase • Introduce augmented feature vector y = [x, f] • Model joint distribution by clustersing to form a GMM - tested from 64 to 128 clusters Pitch Prediction • During prediction stage only have MFCC component x • Pitch is predicted using MAP algorithm from the means and covariance of the clusters • Does not fully exploit the class-based correlation between the MFCC vector and pitch x f b.milner@uea.ac.uk

  8. HMM Pitch Prediction • GMM does not model the temporal correlation of pitch • GMM clusters are trained unsupervised - may be better to used supervised training x Training phase • Model joint distribution of pitch and MFCC using a series of HMMs Pitch Prediction • Perform standard Viterbi decoding of MFCC stream in the HMM • Use model and state sequence information to locate mapping for each MFCC vector and then use MAP to predict pitch f l1 l2 b.milner@uea.ac.uk

  9. Pitch Prediction Results • Aurora database - 200 utterances for training (50 speakers), 90 utterances for testing (23 speakers) • 42,902 frames in total b.milner@uea.ac.uk

  10. Reconstructed Speech original MFCC+ reference pitch MFCC+HMM-based pitch b.milner@uea.ac.uk

  11. Extension to Formant Prediction • Prediction of formants may also be possible from MFCC vectors using similar strategy of modelling joint distribution y = [x, f1, f2, f3, f4, …] • Potentially stronger correlation between formant and MFCCs than pitch and MFCCs • Use Brunel format estimator to provide frequency, bandwidth, amplitude of formants b.milner@uea.ac.uk

  12. Why Predict Formants? • Formant estimation from noisy speech is a difficult task and prone to errors • Predicting them from MFCCs may be more robust • Before prediction can apply noise compensation methods to MFCCs (spectral subtraction/Wiener) • Alternatively model the joint distribution of noisy MFCCs and formants • In effect utilise the correlation information available inside the speech models themselves • Formant predictions provide clean speech information necessary for speech enhancement component of project b.milner@uea.ac.uk

  13. Noisy Speech Databases • Two more noisy speech databases available • SpeechDat-Car - Danish • SpeechDat-Car - Spanish • Connected digit strings recorded in a moving car under different driving conditions. • Both hands-free and close-talking microphone • Available through SIG in COST278 - will request availability to other partners b.milner@uea.ac.uk

  14. Pi Research • Pi Research in Cambridge specialise in data communication in Formula 1 racing • Made an approach regarding possibility of reducing noise on driver-to-pit crew communication • Example - down to SNRs of -30dB b.milner@uea.ac.uk

  15. Pi Research b.milner@uea.ac.uk

  16. End b.milner@uea.ac.uk

More Related