270 likes | 478 Views
Robust Speech Feature . Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal , in EuroSpeech 99. DLFBE ---Preliminary. * MFCC is very successful in speech recognition * MFCC computed from the speech signal using
E N D
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal , in EuroSpeech 99
DLFBE ---Preliminary * MFCC is very successful in speech recognition * MFCC computed from the speech signal using the following three steps: 1.Compute the FFT power spectrum of the speech signal 2.Apply a Mel-space filter-bank to the power spectrum to get N energies (N=20~60) 3.Compute discrete cosine x’form (DCT) of log filter-bank energies to get uncorrelated MFCC’s (M=10)
DLFBE --- Motivation *MFCC has two drawbacks 1. Does not have any physical interpretataion 2. Liftering of cepstral coefficient has no effect in the modern speech recognition (discuss later) *The two problem(i.e., numbers and correlation) in FBE used in 50’s, 60’s,70’s can be solved today
Liftering --- What and How Euclidean distance *Lifter is the reweighting process of cepstral coeff. used in DTW framework of speech recognition where is dissimilarity between the test vector and the mean vector
Liftering --- What and How (cont’d) Where is i-th cepstral coeff. , is the corresponding liftering coeff. and is the lifter So More general form
Liftering --- What and How (cont’d) The types of lifters are listed belows 1.Linear lifter 2.Statistical lifter 3.Sinusoidal lifter 4.Exponential lifter
Liftering --- Discussion and Why * The multiplicative weighting in cepstrum domain is equivalent to convolution in spectral domain
Liftering on CDHMM (??) --- Why Mahalanobis distance measure due to out observation prob.
Liftering on CDHMM (??) --- Why liftering matrix for MFCC where
Liftering on CDHMM (??) --- Why Thus,cepstral liftering has no effect in the recognition process when used with continuous observation Gaussian Density HMM’s
Decorrelation of FBE --- Why/How *FBEs are correlated => we can’t use CDHMM * We can use LP techniques to solve this defeat can be obtained by covariance method of LP analysis
Liftering of FBE --- How N=M+L FIR filter
DLFBE --- Experiment *SI and isolated word recognition using ISOLET spoken letter database *90 training utterances from 90 speakers(45 females,45 males) 30 testing utterances from 30 speakers (15 females,15 males)
Robust Speech Feature Noise-Invariant Representation for Speech Signal Group Delay Function (GDF) Method Proposed by Bayya & Yegnanarayana in EuroSpeech ‘99
GDF --- Motivation *Background noise is a prominent source of mismatch and eliminated roughly by methods as follows 1.compensation cause the overestimation and underestimation side effects
GDF --- Motivation (cont’d) 2.new feature not completely noise resistant *All the above use power/amplitude as speech feature Why don’t we use phase information as features ? And phase infor. may be helpful in speech recognition.
GDF --- What/How *GDF is defined as the normalized autocorrelation of a short segment of a signal (#.1) Where is the normalized autocorrelation of a short segment of a signal
GDF --- What/How (cont’d) (#.2) compare(#.1)&(#.2)
GDF --- What/How (cont’d) Easy to implement Truncated version of GDF
GDF --- What/How (cont’d) where Hanning window
GDF --- Why & Experiment *frame length = 5 ms , frame rate = 1 ms & modified autocorrelation sequence averaged over 20 frames then the GDF computed as defined above
GDF --- Experiment *Isolated-digit recognition ? Due to large dynamic range