310 likes | 403 Views
Introduction. The PDF of cepstral coefficients of speech signals Is usually regarded as a quasi-Gaussian distribution Under this assumption, the purpose of moment normalization of order N is then to have For odd order moments : For even order moments : CMS is to normalize the first moment
E N D
Introduction • The PDF of cepstral coefficients of speech signals • Is usually regarded as a quasi-Gaussian distribution • Under this assumption, the purpose of moment normalization of order N is then to have • For odd order moments : • For even order moments : • CMS is to normalize the first moment • CN is to normalize the second moment
-3 -3 x 10 x 10 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 Silence 132097 frames DIM=0 With CN Without CN
Total 1401927 frames DIM=11 With CN Without CN
Higher Order Cepstral Moment Normalization For Robust Speech Recognition Chang-Wen Hsu and Lin-Shan Lee Graduate Institute of Communication Engineering National Taiwan University ICASSP 2004 2004/1/7 Presented by Chen-Wei Liu
Introduction • In real world speech recognition applications • Robust features are highly desired for better recognition performance under various noisy conditions • MFCC have been well accepted as a good choice • Many advanced techniques have been developed based on them • CMS and CN have been two commonly used methods
Introduction • CMS could • Reduce the effects of channel distortion • Avoids the low frequency noise to be further amplified • CN could • Reduce the differences in PDF between the clean and noisy signals • PIC_1PIC_2 • It is also proposed that • The normalization of the third order cepstral moment may achieve better performance than CMS and CN
HOCMN • What’s the so called Nth moment? • The purpose of moment normalization of order N is then to have
HOCMN First moment • For example • With the above • We could extend the moment normalization to higher orders Second moment
HOCMN for even integer • When performing HOCMN with an even integer • Simply scale the first-order moment normalized coefficients by a constant • Such normalization usually co-exists with the first-order normalization or CMS
HOCMN for even integer • We could obtain b with the following • As the above shows • Different N gives different values of b
HOCMN for an odd integer • It usually also co-exists with the first-order normalization • Could be expressed with the first as well as the (N-1)th order moments
HOCMN for an odd integer • It could be extended as the following • As a is small, we can delete the higher order term
HOCMN for both odd & even • Figure 1
Experimental Setup • Aurora 2.0 • Training set • Clean condition / Multi condition (8 kinds of noise) • Testing set • A - 4 kinds of noise • B - another 4 kinds of noise • C - 8 kinds of noise • HOCMN approaches • Full utterances • Segments
Experimental Results • Baseline Experiments • Clean-condition training for all 3 testing sets A,B,C • Word accuracy was averaged for different noise types and different SNRS (0db ~ 20db)
Experimental Results • Curve (a) • Full utterance with CN
Experimental Results • Baseline for CN
Experimental Results • Averaged for all SNR values
Experimental Results • Averaged for all noise types
Weighting Observation Vectors for Robust Speech Recognition in Noisy Environments Zhenyu Xiong, Thomas Fang Zheng, and Wenhu Wu Tsinghua University, Beijing, 100084, China ICSLP 2004 2004/1/7 Presented by Chen-Wei Liu
Introduction • The key issues in practical speech recognition • To improve the robustness against the mismatch between the training and testing environments • Such as background noise, channel distortion, acoustic echo, … ,etc. . • In most recognition systems • The probability of generating a sequence of observation vectors for some models is calculated as the product of the probabilities of generating each observation • Each observation vector is treated with an equal weight
Introduction • In noisy environments, clean speech and background noise are both time-varying • Speech is corrupted slightly at some time, and corrupted violently at other time • Hence • Observation vectors extracted from the slightly-corrupted speech should be more believable
Front-end Module • pic
Noise Estimation and Spectral Subtraction • Noise estimation is based on the result of speech/non-speech detection • Spectral subtraction
Frame SNR Estimation • This indicates the degree how the current speech frame is uncorrupted with noise
Weighting Algorithm • Conventional • Weighted
Weighting Factor • The weighting factor should be an indicator of • The degree how the corresponding speech frame is uncorrupted with the noise
Experiment Setup • Database • Clean speech with isolated words by 10 males & females • 7893 word utterances in total • Almost each speaker speaks 100 Chinese names 4 times • Noise types are • Factory noise, pink noise, white noise, babble noise • SNR amplitude • (-5, 0, 5, 10, 15, 20) db
Experiment Results • pic1
Experiment Results • pic2