Advances in WP1

Chania Meeting – May 2007 Advances in WP1 www.loquendo.com

Summary • Test on Hiwire DB with denoising methods developed in the project: • Wiener SNR dep. Spectral Subtraction • Ephraim-Malah SNR dep. Spectral Attenuation • Loquendo FE – UGR PEQ Integration • Details • Results on Hiwire db

Chania Meeting – May 2007 HIWIRE DB Test www.loquendo.com

Test Conditions • Test on the last 50 utterances of each speaker (50-99) • The first 50 utterances of each speaker (0-50) left for development or adaptation • Four noise conditions: • Clean • Low Noise (SNR = 10 dB) • Medium Noise (SNR = 5 dB) • High Noise (SNR = -5 dB) • 4049 utterances for each condition, from 81 speakers of 4 nationalities

HMM-ANN Models Two HMM-ANN models have been trained: • Telephone 8 kHz: trained with a large telephone corpus (LDC Macrophone + SpeechDat Mobile) • Microphone 16 kHz: trained with a collection of microphone corpora (timit, wsj0-1, vehic1us-ch0)

Test Results

Comments on Results • The 16 kHz models are more accurate on clean speech (90.5% vs. 88.4%) • Ephraim-Malah noise reduction always outperforms Wiener spectral subtraction (32.8% vs. 25.7% and 25.7% vs. 21.8% E.R.).

Chania Meeting – May 2007 Loquendo FE UGR PEQintegration www.loquendo.com

Loquendo FE UGR PEQ Loquendo ASR PEQ Integration (Loquendo & UGR) Phoneme-based Models Denoise (Power Spectrum level) Feature Normalization (Frame -13 coeff- level)

PEQ effects

PEQ Results • The HMM-ANN models employed are: • WSJ0 models • WSJ0 models + E.M. denoising • WSJ0 models + E.M. denoising + PEQ

EM Denoise and PEQ

Comments on EM denoising - PEQ • On noisy speech (LN, MN, HN): • both EM denoising and PEQ obtain a good improvement • best results are obtained when adding the effects of EM de-noising and PEQ normalization. • On clean speech: • EM denoising does not decrease performances • PEQ normalization slightly decreases performances • PEQ is very useful in mismatched conditions • can (slightly) decrease performances in matched conditions (e.g. clean speech)

Test on TTS American Voice (Dave) • We have used the American voice DAVE of Loquendo TTS to read the 4049 sentences of the Hiwire DB • The great difference in results is due to non-native pronounce • Es. “Range Forty” pronounced • by Dave • by a French speaker • by a Greek speaker

WP1: Workplan • Selection of suitable benchmark databases; (m6) • Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR dependent) (m12) • Discriminative VAD (training+AURORA3 testing) (m16) • Exprimentation of Spectral Attenuation rule (Ephraim-Malah SNR dependent) (m21) • Preliminary results on spectral subtraction and HEQ techniques (m24) • Integration of denoising and normalization techniques (PEQ) (m33)

Advances in WP1

Advances in WP1

Presentation Transcript

PASI WP1

TEMPEST WP1

PACMAN WP1

WP1

Advances in WP1

Contribution – WP1

FarmPath WP1

WP1 Management

WP1 report

WP1

WP1

WP1 presentation

WP1. GOAL

WP1 Objectives

WP1 : Applications

Advances in WP1

Advances in WP1

WP1 PRESENTATION

WP1 : Applications

WP1 Review