330 likes | 557 Views
Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach. Intellectual Property. Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.1.3 (2003-11) European Telecommunications Standards Institute
E N D
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach
Intellectual Property • Advanced front-end feature extraction algorithm • ETSI ES 202 050 V1.1.3 (2003-11) • European Telecommunications Standards Institute • ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ).
Noise Reduction • Based on Weiner filter theory • Noise reduction is performed in two stages • Input signal is de-noised in the first stage. • Second stage – dynamic noise reduction based on SNR of processed signal
First Stage PSD Mean Spectrum Estimation WF Design Mel Filter-Bank Mel IDCT Apply Filter VADNest To Second Stage
Second Stage From First Stage PSD Mean Spectrum Estimation WF Design Mel Filter-Bank Gain Factorization Mel IDCT Apply Filter OFF Output
Buffering Buffer 1 Buffer 2 • 1 frame = 80 samples • 1 buffer = 4 frames 0 1 2 3 0 1 2 3 A B C D E F G H De-noised (1st Stage) De-noised (output) B C D new F G H A De-noised (output)
Spectrum Estimation • Input signal is divided into overlapping frames of Nin = 200 samples. • A 25ms frame length and 10ms frame shift (80 samples) are used. • Each frame Sw(n) is windowed with a Hanning window of length Nin.
Spectrum Estimation where Padding from Nin up to NFFT-1, NFFT = 256
Spectrum Estimation • Frequency representation: • Power spectrum: • Smoothing:
Power Spectral Density Mean • Compute for each Pin(bin) the mean over the last TPSD = 2 frames.
Wiener Filter Design • A forgetting factor (weight) is computed for each frame, λNSE. If (t < 100 frames) λNSE= 1 – 1/t else λNSE= 0.99
Wiener Filter Design First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P1/2noise(bin,tn) = min(λNSE ● P1/2noise(bin,tn-1)+(1- λNSE)●PSDmean,exp(-10)) If flag = 1 P1/2noise(bin,t) = P1/2noise(bin,tn) (last non speech frame)
Wiener Filter Design Second stage is updated permanently: If (t < 11) Pnoise(bin,t) = λNSE ● Pnoise(bin,tn-1)+(1- λNSE)●PSDmean else update = 0.9 + 0.1×PinPSD(bin,t)/(PinPSD(bin,t)+ Pnoise(bin,t-1) ) ×(1+1/(1+0.1×PinPSD(bin,t) /(PinPSD(bin,t-1))) Pnoise(bin,t) = Pnoise(bin,t-1)×update
Wiener Filter Design Noiseless spectrum is estimated: P1/2den(bin,t) = 0.98×P1/2den(bin,t-1)+(1-0.98)×T[PSDmean -P1/2noise(bin,t) ] where the threshold function T is
Wiener Filter Design The priori SNR is calculated: The filter transfer function is
Wiener Filter Design The filter transfer function is used to improve noiseless signal estimation: The improved priori SNR is:
Voice Activity Detection • VAD is used to detect noise frames • Find frame energy: If frame threshold < 10 long term energy factor (LTE) = 1 - 1/t Else LTE = 0.97; Calculate frame energy:
Voice Activity Detection • Use frame energy to update mean energy: If frame energy - mean energy < 20 (SNR threshold) or t < 10 Then if (frameEn < meanEn) or (t < 10) meanEn = meanEn + (1 - LTE ) * (frameEn - meanEn) Else meanEn = meanEn+(1 - 0.99) * (frameEn - meanEn) If (meanEn < 80) meanEn = 80
Voice Activity Detection • Is the current frame speech? If t > 4 if (frameEn - meanEn) > 15 IT IS SPEECH nbSpeechFrame++ else if nbSpeechFrame > 4 hangover = 15, nbSpeechFrame = 0 if (hangover != 0) IT IS SPEECH else IT IS NOT SPEECH
Mel Filter Bank • The linear frequency Weiner filter coefficients are smoothed and transformed to the Mel-frequency scale. • The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.
Mel IDCT • The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform:
Gain Factorization • Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. • The de-noised frame signal energy is calculated as:
Gain Factorization • The noise energy of the current frame is estimated as:
Gain Factorization • The smoothed SNR is evaluated using 3 de-noised frame energies and the noise energy If (Ratio > 0.0001) Then SNRavg(t) = 6.67 × log10 (Ratio) Else SNRavg(t) = -33.3
Gain Factorization • To decide the degree of aggression, the SNR is tracked: If {(SNRavg(t) – SNRlow-track(t-1)) < 10 or t < 10} calculate λSNR(t) SNRlow-track(t) = λSNR(t)× SNRlow-track(t -1)+(1- λSNR(t))×SNRavg(t) Else SNRlow-track(t) = SNRlow-track(t -1)
Gain Factorization • Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. • The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.
Apply Filter • The causal impulse response is obtained, truncated and weighted by a Hanning window. • The input signal is filtered with the filter impulse response to produce the noise-reduced signal.
Offset Compensation • A filter is used to remove the DC offset over the frame length interval (80 samples). Where Snr is the noise reduced signal
Results Noisy test file: After de-noise:
Results Footloose: Not Footloose:
Results: why didn’t this work? Hair dryer: Still there?!?!:
Results Hair dryer: Gone: