280 likes | 421 Views
Noise Reduction in Speech Recognition. Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06. Outline. Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven
E N D
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06
Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications
Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications
Mel Frequency Cepstral Coefficients(MFCC) • The most common used feature in speech recognition • Advantages: High accuracy and low complexity 39 dimension
xt(n) Mel filter-bank At(k) Speech signal Pre-emphasis DFT x(n) x’(n) Yt(m) Window energy Log(| |2) et derivatives Yt’(m) yt(j) IDFT MFCC Mel Frequency Cepstral Coefficients(MFCC) • The framework of feature extraction:
Pre-emohasis • Pre-emphasis of spectrum at higher frequencies Pre-emphasis x[n] x’[n]
End-point Detection(Voice activity detection) Speech Noise(silence)
Windowing Rectangle window Hamming window
Mel-filter bank • After DFT we get spectrum amplitude frequency
Mel-filter bank amplitude frequency Triangular shape in frequency(overlaped) Uniformly spaced below 1kHz Logarithmic scale above 1kHz
Delta Coefficients • 1 st/2 nd order differences 13 dimension 39 dimension 1 st order 2 nd order
Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications
y[n] W=w1w2...wR x[n] Search Feature Extraction O =o1o2…oT h[n] output sentences input signal original speech feature vectors acoustic reception microphone distortion phone/wireless channel n2(t) n1(t) Text Corpus Speech Corpus Acoustic Models Lexicon Language Model additive noise additive noise convolutional noise Acoustic Models x[n] Feature Extraction Model Training (training) Acoustic Models (recognition) Search and Recognition Feature Extraction y[n] Feature-based Approaches Model-based Approaches Speech Enhancement Mismatch in Statistical Speech Recognition • Possible Approaches for Acoustic Environment Mismatch
Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) P P • Cepstral Mean Substraction(CMS)—Convolutional Noise • Convolutional noise in time domain becomes additive in cepstral domain • y[n] = x[n]h[n] y = x+h ,x, y, h in cepstral domain • most convolutional noise changes only very slightly for some reasonable time interval x = yh • Cepstral Mean Substraction(CMS) • assuming E[x] = 0 , then E[y] = h • xCMS= yE[y] P(y) P(y) P(x) P(x) CMS
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) • CMVN: variance normalized as well • xCMVN= xCMS/[Var(xCMS)]1/2 P(y) P(y) P(x) P(x) P(y) P(x) CMS CMVN
Feature-based Approach-HEQ(Histogram Equalization) • The whole distribution equalized • y=CDFy-1[CDFx(x)] P P CDFx CDFy P=0.2 P=0.2 x y 3 3.5
Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications
Feature-based Approach-RASTA amplitude f amplitude f Perform filtering on these signals(temporal filtering) modulation frequency
Modulation Frequency (Hz ) Feature-based Approach-RASTA(Relative Spectral Temporal filtering) • Assume the rate of change of noise often lies outside the typical rate of vocal tract shape • A specially designed temporal filter Emphasize speech
Data-driven Temporal filtering • PCA(Principal Component Analysis) y x e
B1(z) B2(z) Original feature stream yt Bn(z) Frame index L zk(1) zk(2) zk(3) Data-driven Temporal filtering • We should not guess our filter, but get it from data filter convolution
Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications
Speech Enhancement- Spectral Subtraction(SS) • producing a better signal by trying to remove the noise • for listening purposes or recognition purposes • Noise n[n] changes fast and unpredictably in time domain, but relatively slowly in frequency domain, N(w) amplitude speech amplitude speech noise noise f t
Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications
Conclusions • We give a general framework of how to extract speech feature • We introduce the mainstream robustness • There are still numerous noise reduction methods(leave in the reference)