Noise Reduction in Speech Recognition

Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06

Outline • Mel Frequency Cepstral Coefficient(MFCC) • Mismatch in speech recognition • Feature based-CMS、CMVN、HEQ • Feature based-RASTA、data-driven • Speech enhancement-Spectral substraction、wiener filtering • Conclusions and applications

Mel Frequency Cepstral Coefficients(MFCC) • The most common used feature in speech recognition • Advantages: High accuracy and low complexity 39 dimension

xt(n) Mel filter-bank At(k) Speech signal Pre-emphasis DFT x(n) x’(n) Yt(m) Window energy Log(| |2) et derivatives Yt’(m) yt(j) IDFT MFCC Mel Frequency Cepstral Coefficients(MFCC) • The framework of feature extraction:

Pre-emohasis • Pre-emphasis of spectrum at higher frequencies Pre-emphasis x[n] x’[n]

End-point Detection(Voice activity detection) Speech Noise(silence)

Windowing Rectangle window Hamming window

Mel-filter bank • After DFT we get spectrum amplitude frequency

Mel-filter bank amplitude frequency Triangular shape in frequency(overlaped) Uniformly spaced below 1kHz Logarithmic scale above 1kHz

Delta Coefficients • 1 st/2 nd order differences 13 dimension 39 dimension 1 st order 2 nd order

y[n] W=w1w2...wR x[n] Search Feature Extraction O =o1o2…oT h[n] output sentences input signal original speech feature vectors acoustic reception microphone distortion phone/wireless channel n2(t) n1(t) Text Corpus Speech Corpus Acoustic Models Lexicon Language Model additive noise additive noise convolutional noise Acoustic Models x[n] Feature Extraction Model Training (training) Acoustic Models (recognition) Search and Recognition Feature Extraction y[n] Feature-based Approaches Model-based Approaches Speech Enhancement Mismatch in Statistical Speech Recognition • Possible Approaches for Acoustic Environment Mismatch

Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) P P • Cepstral Mean Substraction(CMS)—Convolutional Noise • Convolutional noise in time domain becomes additive in cepstral domain • y[n] = x[n]h[n]  y = x+h ,x, y, h in cepstral domain • most convolutional noise changes only very slightly for some reasonable time interval x = yh • Cepstral Mean Substraction(CMS) • assuming E[x] = 0 , then E[y] = h • xCMS= yE[y] P(y) P(y) P(x) P(x) CMS

Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN) • CMVN: variance normalized as well • xCMVN= xCMS/[Var(xCMS)]1/2 P(y) P(y) P(x) P(x) P(y) P(x) CMS CMVN

Feature-based Approach-HEQ(Histogram Equalization) • The whole distribution equalized • y=CDFy-1[CDFx(x)] P P CDFx CDFy P=0.2 P=0.2 x y 3 3.5

Feature-based Approach-RASTA amplitude f amplitude f Perform filtering on these signals(temporal filtering) modulation frequency

Modulation Frequency (Hz ) Feature-based Approach-RASTA(Relative Spectral Temporal filtering) • Assume the rate of change of noise often lies outside the typical rate of vocal tract shape • A specially designed temporal filter Emphasize speech

Data-driven Temporal filtering • PCA(Principal Component Analysis) y x e

B1(z) B2(z) Original feature stream yt Bn(z) Frame index L zk(1) zk(2) zk(3) Data-driven Temporal filtering • We should not guess our filter, but get it from data filter convolution

Speech Enhancement- Spectral Subtraction(SS) • producing a better signal by trying to remove the noise • for listening purposes or recognition purposes • Noise n[n] changes fast and unpredictably in time domain, but relatively slowly in frequency domain, N(w) amplitude speech amplitude speech noise noise f t

Conclusions • We give a general framework of how to extract speech feature • We introduce the mainstream robustness • There are still numerous noise reduction methods(leave in the reference)

References

Q & A

Noise Reduction in Speech Recognition

Noise Reduction in Speech Recognition

Presentation Transcript

Speech Recognition

Speech Recognition

Audio and Speech Processing Topic-3 Noise Reduction

Speech Recognition

Environmental Noise No Longer Relevant for Speech Recognition.

Speech recognition

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition in Noise

Speech Recognition

Speech Recognition

SPEECH RECOGNITION:

Speech Recognition

Speech Enhancement through Noise Reduction

Noise Reduction

Speech Recognition

Noise Reduction

Speech Recognition

Speech Recognition

Speech Recognition

Speech Recognition