160 likes | 277 Views
Chania Meeting – May 2007. Advances in WP1. www.loquendo.com. Summary. Test on Hiwire DB with denoising methods developed in the project: Wiener SNR dep. Spectral Subtraction Ephraim-Malah SNR dep. Spectral Attenuation Loquendo FE – UGR PEQ Integration Details Results on Hiwire db.
E N D
Chania Meeting – May 2007 Advances in WP1 www.loquendo.com
Summary • Test on Hiwire DB with denoising methods developed in the project: • Wiener SNR dep. Spectral Subtraction • Ephraim-Malah SNR dep. Spectral Attenuation • Loquendo FE – UGR PEQ Integration • Details • Results on Hiwire db
Chania Meeting – May 2007 HIWIRE DB Test www.loquendo.com
Test Conditions • Test on the last 50 utterances of each speaker (50-99) • The first 50 utterances of each speaker (0-50) left for development or adaptation • Four noise conditions: • Clean • Low Noise (SNR = 10 dB) • Medium Noise (SNR = 5 dB) • High Noise (SNR = -5 dB) • 4049 utterances for each condition, from 81 speakers of 4 nationalities
HMM-ANN Models Two HMM-ANN models have been trained: • Telephone 8 kHz: trained with a large telephone corpus (LDC Macrophone + SpeechDat Mobile) • Microphone 16 kHz: trained with a collection of microphone corpora (timit, wsj0-1, vehic1us-ch0)
Comments on Results • The 16 kHz models are more accurate on clean speech (90.5% vs. 88.4%) • Ephraim-Malah noise reduction always outperforms Wiener spectral subtraction (32.8% vs. 25.7% and 25.7% vs. 21.8% E.R.).
Chania Meeting – May 2007 Loquendo FE UGR PEQintegration www.loquendo.com
Loquendo FE UGR PEQ Loquendo ASR PEQ Integration (Loquendo & UGR) Phoneme-based Models Denoise (Power Spectrum level) Feature Normalization (Frame -13 coeff- level)
PEQ Results • The HMM-ANN models employed are: • WSJ0 models • WSJ0 models + E.M. denoising • WSJ0 models + E.M. denoising + PEQ
Comments on EM denoising - PEQ • On noisy speech (LN, MN, HN): • both EM denoising and PEQ obtain a good improvement • best results are obtained when adding the effects of EM de-noising and PEQ normalization. • On clean speech: • EM denoising does not decrease performances • PEQ normalization slightly decreases performances • PEQ is very useful in mismatched conditions • can (slightly) decrease performances in matched conditions (e.g. clean speech)
Test on TTS American Voice (Dave) • We have used the American voice DAVE of Loquendo TTS to read the 4049 sentences of the Hiwire DB • The great difference in results is due to non-native pronounce • Es. “Range Forty” pronounced • by Dave • by a French speaker • by a Greek speaker
WP1: Workplan • Selection of suitable benchmark databases; (m6) • Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR dependent) (m12) • Discriminative VAD (training+AURORA3 testing) (m16) • Exprimentation of Spectral Attenuation rule (Ephraim-Malah SNR dependent) (m21) • Preliminary results on spectral subtraction and HEQ techniques (m24) • Integration of denoising and normalization techniques (PEQ) (m33)