1 / 10

Multiband With Contaminated Training Data Results on AURORA 2

Multiband With Contaminated Training Data Results on AURORA 2. TCTS Faculté Polytechnique de Mons Belgium. INTRODUCTION. The noise contamination of speech corpus leads to quasi- optimal performance when test noise conditions match training noise condition.

Download Presentation

Multiband With Contaminated Training Data Results on AURORA 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiband With Contaminated Training DataResults on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium RESPITE workshop - Martigny

  2. INTRODUCTION • The noise contamination of speech corpus leads to quasi- optimal performance when test noise conditions match training noise condition. • We observe that, in narrow frequency bands, the noise characteristics basically differ by their level only. • Combining the multiband approach and the training data contamination can lead to models robust models for any kind of noises. • We train models in each subband from data corrupted by white noise at different SNR. Subbands are then recombined using a MLP. RESPITE workshop - Martigny

  3. CONTAMINATED TRAINING CORPUS Adding white noiseSNR = 0 dB Adding white noiseSNR = 5 dB Adding white noiseSNR = 10 dB Sampled speech corpus Noisy speech corpus Adding white noiseSNR = 15 dB Adding white noiseSNR = 20 dB RESPITE workshop - Martigny

  4. Grouping and normalization ANN Noise suppression methods Compensation methods Bandpass analysis 0-376Hz Bandpass analysis 307-638Hz Bandpass analysis 553-971Hz Filter bank analysis Windowing Bandpass analysis 861-1413Hz Bandpass analysis 1266-2013Hz Microphone arrays Bandpass analysis 2213-2839Hz Noise robust acoustic features Bandpass analysis 2562-4000Hz MULTIBAND ANALYSIS RESPITE workshop - Martigny

  5. NLDA parameters State posteriors probabilities Acoustic features NONLINEAR DISCRIMINANT ANALYSIS RESPITE workshop - Martigny

  6. Automatic speech recognition system Concatenation Robust parameters Training on contaminated data Model adaptation ROBUST ASR RESPITE workshop - Martigny

  7. AURORA 2 Clean training set: 8440 utterances Multi-condition training set: 8440 utterances Contaminated training set: 8440 utterances corrupted by white noise + 4220 clean utterances. Test set ‘a’: 4 different kinds of noises matching the multi-condition training set covering SNR from clean speech to –5 dB. Acoustic models: Hybrid HMM/MLP trained on Daimler-Chrysler word models (127 HMM states). Recognition: STRUT Viterbi decoder, no syntax RESPITE workshop - Martigny

  8. TEST CONDITIONS • Clean training set/J-RASTA • MLP: (15*13) x 1000 x 127 = 323,195 parameters • Multi-condition training set/J-RASTA • MLP: (15*13) x 1000 x 127 = 323,195 parameters • Contaminated training set/multiband • 7 subbands (15*4) x 1000 x 30 x 127Recombination MLP: (3*210) x 1000 x 127Total: 1,531,185 parameters • 7 subbands (15*4) x 150 x 30 x 127Recombination MLP: 210 x 500 x 127Total: 285,565 parameters RESPITE workshop - Martigny

  9. RESULTS Number of parameters 323,195 323,195 Number of parameters 323,195 323,195 1,531,185 Number of parameters 323,195 323,195 1,531,185 285,565 RESPITE workshop - Martigny

  10. CONCLUSIONS The combination of the multiband paradigm and training data contamination has been tested on the reference task: AURORA 2. We got up to 57% relative improvement compared to robust features such as J-RASTA PLP features. Compared to matching noise condition training, WER are only 10% (relative) higher. Test with a very « light » system led to a small degradation of recognition performance. RESPITE workshop - Martigny

More Related