Introduction

Introduction • The PDF of cepstral coefficients of speech signals • Is usually regarded as a quasi-Gaussian distribution • Under this assumption, the purpose of moment normalization of order N is then to have • For odd order moments : • For even order moments : • CMS is to normalize the first moment • CN is to normalize the second moment

-3 -3 x 10 x 10 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 Silence 132097 frames DIM=0 With CN Without CN

Total 1401927 frames DIM=11 With CN Without CN

Higher Order Cepstral Moment Normalization For Robust Speech Recognition Chang-Wen Hsu and Lin-Shan Lee Graduate Institute of Communication Engineering National Taiwan University ICASSP 2004 2004/1/7 Presented by Chen-Wei Liu

Introduction • In real world speech recognition applications • Robust features are highly desired for better recognition performance under various noisy conditions • MFCC have been well accepted as a good choice • Many advanced techniques have been developed based on them • CMS and CN have been two commonly used methods

Introduction • CMS could • Reduce the effects of channel distortion • Avoids the low frequency noise to be further amplified • CN could • Reduce the differences in PDF between the clean and noisy signals • PIC_1PIC_2 • It is also proposed that • The normalization of the third order cepstral moment may achieve better performance than CMS and CN

HOCMN • What’s the so called Nth moment? • The purpose of moment normalization of order N is then to have

HOCMN First moment • For example • With the above • We could extend the moment normalization to higher orders Second moment

HOCMN for even integer • When performing HOCMN with an even integer • Simply scale the first-order moment normalized coefficients by a constant • Such normalization usually co-exists with the first-order normalization or CMS

HOCMN for even integer • We could obtain b with the following • As the above shows • Different N gives different values of b

HOCMN for an odd integer • It usually also co-exists with the first-order normalization • Could be expressed with the first as well as the (N-1)th order moments

HOCMN for an odd integer • It could be extended as the following • As a is small, we can delete the higher order term

HOCMN for both odd & even • Figure 1

Experimental Setup • Aurora 2.0 • Training set • Clean condition / Multi condition (8 kinds of noise) • Testing set • A - 4 kinds of noise • B - another 4 kinds of noise • C - 8 kinds of noise • HOCMN approaches • Full utterances • Segments

Experimental Results • Baseline Experiments • Clean-condition training for all 3 testing sets A,B,C • Word accuracy was averaged for different noise types and different SNRS (0db ~ 20db)

Experimental Results • Curve (a) • Full utterance with CN

Experimental Results • Baseline for CN

Experimental Results • Averaged for all SNR values

Experimental Results • Averaged for all noise types

Weighting Observation Vectors for Robust Speech Recognition in Noisy Environments Zhenyu Xiong, Thomas Fang Zheng, and Wenhu Wu Tsinghua University, Beijing, 100084, China ICSLP 2004 2004/1/7 Presented by Chen-Wei Liu

Introduction • The key issues in practical speech recognition • To improve the robustness against the mismatch between the training and testing environments • Such as background noise, channel distortion, acoustic echo, … ,etc. . • In most recognition systems • The probability of generating a sequence of observation vectors for some models is calculated as the product of the probabilities of generating each observation • Each observation vector is treated with an equal weight

Introduction • In noisy environments, clean speech and background noise are both time-varying • Speech is corrupted slightly at some time, and corrupted violently at other time • Hence • Observation vectors extracted from the slightly-corrupted speech should be more believable

Front-end Module • pic

Noise Estimation and Spectral Subtraction • Noise estimation is based on the result of speech/non-speech detection • Spectral subtraction

Frame SNR Estimation • This indicates the degree how the current speech frame is uncorrupted with noise

Weighting Algorithm • Conventional • Weighted

Weighting Factor • The weighting factor should be an indicator of • The degree how the corresponding speech frame is uncorrupted with the noise

Relationship between SNR and The Weighting Factor • pic

Experiment Setup • Database • Clean speech with isolated words by 10 males & females • 7893 word utterances in total • Almost each speaker speaks 100 Chinese names 4 times • Noise types are • Factory noise, pink noise, white noise, babble noise • SNR amplitude • (-5, 0, 5, 10, 15, 20) db

Experiment Results • pic1

Experiment Results • pic2

Introduction

Introduction

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction