300 likes | 456 Views
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition. Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08. Outline. Introduction CMS/CMVN/HEQ Higher Order Cepstral Moment Normalization (HOCMN) Even order HOCMN Odd order HOCMN Cascade system
E N D
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08
Outline • Introduction • CMS/CMVN/HEQ • Higher Order Cepstral Moment Normalization (HOCMN) • Even order HOCMN • Odd order HOCMN • Cascade system • Fundamental principles • Experimental Results • Conclusions
Introduction • Feature normalization in cepstral domain is widely used in robust speech recognition: • CMS: normalizing the first moment • CMVN: normalizing the first and second moments • Cepstrum Third-order Normalization (CTN): normalizing the first three moments (Electronics Letters, 1999) • HEQ: normalizing the full distribution (all order moments) • How about normalizing a few higher order moments only? • Higher order moments are more dominated by higher value samples • Normalizing only a few higher order moments may be good enough, while avoiding over-normalization
progressively Time Introduction • Cepstral Normalization • CMS: • CMVN:
Introduction • Histogram Equalization
Higher Order Cepstral Moment Normalization • If the distribution of the cepstral coefficients can be assumed to be quasi-Gaussian: • Odd order moments can be normalized to zero • Even order moments can be normalized to some specific values • Define notation: • X(n): a certain cepstral coefficient of the n-th frame • X[k](n): with the k-th moment normalized • X[k,l](n): with both the k-th and l-th moments normalized • X[k,l,m](n): with the k-th, l-th and m-th moments normalized • HOCMN[k,l,m]: an operator normalizing the k-th, l-th and m-th moments • For example
Cepstral Moment Normalization • Moment estimation: • Time average of MFCC parameters • Purpose: • For odd order L • For even order N
Even order HOCMN • Only the moment for a single even order N can be normalized and CMS can always be performed in advance • Therefore, the new feature coefficients can be expressed as • Let the desired value of the N-th moment of the new feature coefficient be , that is
Even order HOCMN • Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C) CMVN=HOCMN[1,2]
Acc. 82.40 to be normalized l=86 is best 82.00 81.60 …… X(n-3) X(n-2) X(n-1) X(n) X(n+1) X(n+2) X(n+3) …… 81.20 l 80.80 80.40 60 70 80 90 100 110 120 l [1,100] Even order HOCMN • Evaluation of the expectation value for the moments • Sample average over a reference interval • Full utterance • Moving window of l frames
Experimental results • Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C) CMVN (l=86) CMVN (full-utterance)
Odd order HOCMN (1/3) • Besides the first moment (CMS), only another single moment of odd order L can be normalized in addition • The L-th HOCMN can be obtained from the (L-1)-th HOCMN (which is for an even number as discussed previously) • Then, the new feature coefficients can be expressed as “a” and “c” are to be solved
Odd order HOCMN (2/3) • To solve “a” and “c” • The first moment is set to zero • The N-th moment is set to zero • After some mathematics and approximation
Odd order HOCMN (3/3) • Because the formula for “a” above is only an approximation, a recursive solution can be obtained in about two iterations
Cascade system • Cascading an odd order operator HOCMN[1,L] (L is an odd number) and an even order operator HOCMN[1,N] (N is an even number) can obtain an operator HOCMN[1,L,N]
Experimental results • Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C) CTN=HOCMN[1,2,3] CTN=HOCMN[1,2,3] CN (l=86) CMVN (l=86) CN CMVN
Skewness and Kurtosis • Skewness • Third moment about the mean and normalized to the standard deviation • Pdf departure from symmetric • Positive/negative indicate skew to right/left • Zero indicate symmetric • Kurtosis • Fourth moment about the mean and normalized to the standard deviation • Peaked or “flat with tails of large size” as compared to standard Gaussian • “3” is the fourth moment of N(0,1) • Positive/negative indicate flatter/more peaked
Skewness and Kurtosis • 1st-moment always normalized • Define: Generalized skewness of odd order L • L are not necessary 3 • Similar meaning as skewness (skew to right or left) except in the sense of L–th moment • Define: Generalized kurtosis of even order N • N are not necessary 4 • Similar meaning as kurtosis (peaked or flat) except in the sense of N–th moment
Skewness and Kurtosis • Normalizing odd order moment is to constrain the pdf to be symmetric about the origin • Except in the sense of L-th moment • Normalizing even order moment is to constrain the pdf to be “equally flat with tails of equal size” • Except in the sense of N-th moment
Generalized Moments • The order of normalized moments are not necessary integers • Generalized moment • Type 1: • Reduced to odd order moment when u is an odd integer L (ex: L=1 or 3) • Type 2: • Reduced to even order moment when u is an even integer N (ex: N=2 or 4) • HOCMN with non-integer moment orders
Experimental Setup • Aurora2 database • Training: Clean condition training • Testing: Set A, B and C • Development: All from clean training data • 39-dimension feature coefficients • C0~C12 MFCC, Δ, Δ2 • Normalization performed on C0~C12
Experimental Results • Higher order moments can derive more robust features • Normalizing only three orders of moments are better • than full distribution
PDF Analysis Original C0 & C1 • HEQ • Over fitting to Gaussian • Loss original statistics • HOCMN • Fitting the generalized skewness and kurtosis • Retain more speech nature HEQ HOCMN
Distance Analysis • Distance definition: • HOCMN can derive smaller distance between • clean and noisy speech • distance reduction has similar trend as error • rate reduction
Experimental Results • Slight improvement for HOCMN with non-integer • order moments • Especially for lower SNR values • Other robust techniques can be combined with it
Experimental Results • For multi-condition training: • HOCMN performs better than CMVN for all SNR values • Better than HEQ for higher SNR values
Conclusions • We proposed a unified framework for higher moment order cepstral normalization • Normalization of higher moment order gives more robust features • Parameter set can be appropriately selected by development set • Skewness/kurtosis/distance analysis can further demonstrate the concepts of the normalization techniques