220 likes | 371 Views
Meeting 6. Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July, 2005. Review of Noise Reduction Methods (more Results) Review of the methods DFT-Kalman, a new method for parameter estimation Evaluation results and sample speech signals
E N D
Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July, 2005
Review of Noise Reduction Methods (more Results) Review of the methods DFT-Kalman, a new method for parameter estimation Evaluation results and sample speech signals FTLP-HNM Model FTLP-HNM for gap restoration Noise Station An Interface for the programs Contents
Review of Noise Reduction Methods • Most noise reduction systems fit to this block-diagram • The de-noising method is based on: • Spectral subtraction, or • Bayesian Estimation
Where S,X and Nare the speech, noisy speech and noise spectral amplitudes, k is the frequency index, αis the power exponent A and B are attenuation and subtraction coefficients respectively and T is the dynamic threshold Spectral subtraction methods vary with the methods used to for estimation of A and B Spectral Subtraction • Spectral subtraction method is generally formulized as:
Spectral Subtraction • Simple SS: Constant A and B (e.g. A=1, B=1, T=0 α=1 or 2) • Adaptive Spectral Subtraction: • Using a posteriori SNR (uses only the speech information in current frame) • Using a priori SNR (tracks the fluctuations of speech in successive frames) • Using a posteriori and a priori SNRs (e.g. optimized to give the MMSE) • Different algorithms are used for calculation of the threshold • The number of negative values resulting from spectral subtraction could be large and depends on the noise spectrum and SNR
Frames are independent: Estimation of ST-DFT components (real and imaginary) Gaussian-Gaussian (Wiener) Other distributions for speech and noise (various estimators by Martin) Estimation of the amplitude and using noisy phase Amplitude, log-Amplitudes, Power (different parameters to be estimated) Gaussian, Gaussian Mixtures (needs training), Laplacian (computationally not feasible) Criteria: MMSE, MAP, Joint phase and amplitude MAP, etc. Methods for parameter estimation use inter-frame information Frames are not independent: DFT-Kalman Bayesian Estimation
Wiener: speech always suppressed Distributions vary from phoneme to phoneme and frequency to frequency Bayesian Estimation Average Symetric Kullback-Leibler Distance
DFT-Kalman • Incorporate the AR model of the short-time DFT trajectories for estimation • Gaussian Distribution • Noise in each ST-DFT channel is assumed to be WGN
DFT-Kalman • During noise only periods the output converges to zero, making the whole output zeroIn order to avoid too small values of LP error covariance, Q, during speech active periods:Q=max (Q,m×|X(k)|2) (0.05)2 <m<(0.30)2 • Small values of m results in further reduction of background noise but results in more distortion of the speech signal.
DFT-Kalman Another method is based on spectral subtraction of the ST-DFT Trajectories. An autocorrelation vector is obtained using spectral subtraction at the start of the speech after long noise-only periods: Where L+1 is the number of samples used in calculation of the autocorrelation vector and Xr(n) is the real component of the ST-DFT trajectories at frame n and an arbitrary frequency. Similar equations hold for the imaginary components.
DFT-Kalman This autocorrelation is linearly combined with the estimated autocorrelation obtained from previous estimated samples: Where n1 is the frame index of the first speech segment detected. Regardless of the presence of speech if the variance of the excitation of the AR model is lower than a fixed threshold, a weighted average of the spectral subtraction-based autocorrelation and the autocorrelation of the previous estimates of the ST-DFT trajectories is used:
Evaluation of the methods • The correlation coefficient between different distortion measures and the mean opinion score (MOS) of 90 sentences is calculated (noisy, clean and de-noised) (number of listeners: 10) • PESQ has the highest correlation with the MOS results
PESQ – Car Noise SASS: Simple Amplitude SS BPSS:a post. Power SS MBSS: Multiband SSSSAPR:a priori Amplitude SS PSS: Parametric SS MMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
PESQ – Train Noise SASS: Simple Amplitude SS BPSS:a post. Power SS MBSS: Multiband SSSSAPR:a priori Amplitude SS PSS: Parametric SS MMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Mean Opinion Score – Car Noise SASS: Simple Amplitude SS BPSS:a post. Power SS MBSS: Multiband SSSSAPR:a priori Amplitude SS PSS: Parametric SS MMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Mean Opinion Score – Train Noise SASS: Simple Amplitude SS BPSS:a post. Power SS MBSS: Multiband SSSSAPR:a priori Amplitude SS PSS: Parametric SS MMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Car Noise Noisy SASS BPSS MBSS SSAPR PSS Wiener MMSE STSA MMSE LSA GGDFT DFTK DFTSS Sample Speech Signals • Train Noise • Noisy • SASS • BPSS • MBSS • SSAPR • PSS • Wiener • MMSE STSA • MMSE LSA • GGDFT • DFTK • DFTSS • Clean Signal SASS: Simple Amplitude SS BPSS:a post. Power SS MBSS: Multiband SSSSAPR:a priori Amplitude SS PSS: Parametric SS MMSE STSA: Ephraim’s Amp. Estimator MMSE LSA: Ephraim’s Log-Amp. EstimatorGGDFT: Martin’s Gamma-Gamma DFT Estimator
Investigate the effect of incorporating noise AR model in the Kalman formulation: Where F’s are the state transition matrices of speech and noise. Clean speech would a by-product of the Kalman filtering Future and Present Work
Future and Present Work • Development of FTLP-HNM model together with the group and explore its potential for: • Gap Restoration, • Speech Enhancement, and • (possibly) Coding • The problem with phase in gap restoration • Sample
Future and Present Work • Further development of the Noise Station program
Future and Present Work • Current capabilities: • Open/Close/Save/Amplify/Play/Resample wave signals • Frame by Frame and overall viewing of signal/FFT/LP Spectrum/Excitation/Formants/Pitch Frequency/Harmonics • Add Noise/De-Noise (different methods)/Distortion Measurement • Formant/Pitch/Harmonic Tracking and viewing • Future capabilities • An option for adding new methods (de-noising, pitch tracking, etc) easily
Future and Present Work Template for the Programs function output=MMSESTSA84_NS(signal,fs,P) % output=MMSESTSA84_NS(signal,fs,P) % HELP AND DIRECTIONS APPEARE HERE % Author: - % Date: Dec-04 % INITIALIZE ALL THE PARAMETERS HERE PARAMETER IS=.25; %INITIAL SILENCE LENGTH alpha=.99; %DECISION DIRECTED PARAMETER if (nargin>=3 & isstruct(P)) %EXTRACTING PARAMETERS if isfield(P,'alpha') alpha=IS.alpha; %DECISION DIRECTED PARAMETER else alpha=.99; %DECISION DIRECTED PARAMETER end if isfield(P,'IS') IS=P.IS; else IS=.25; %INITIAL SILENCE LENGTH end end %THE PROGRAM STARTS HERE...............