200 likes | 351 Views
Speech Enhancement. Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method. Process Flow. Segmenting of Signal. The sample is divided into frames whose length is equal to 25ms with a shift percentage of 40% or 10ms.
E N D
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method
Segmenting of Signal • The sample is divided into frames whose length is equal to 25ms with a shift percentage of 40% or 10ms. • The Window Length is equal to the 25ms times the Sampling Frequency. • Example • Sampling Frequency is equal to 8000 samples/s • Window Length = 0.025s * 8000 samples/s = 200 samples • Each frame is then windowed using a Hamming window.
Initial Silence Segments • The initial silence or speech inactivity period is assumed to be 250ms. • This is to allow for a sufficient amount of data to be analyzed for the Noise Spectrum prior to attempting Voice Activity Detection (VAD). • The Number of Initial Silence Segments (NISS) = (Initial Silence * Sampling Frequency - Window Length)/(Shift Percentage* Window Length). • Example • Using our previous values. • NISS = (0.25s * 8000 samples/s - 200 samples)/0.4*200 samples = 22.5. • The value is rounded down to the nearest whole number
Phase Calculation using FFT • The Fast Fourier Transform of each frame is calculated. • The phase component of the FFT is calculated for use in reconstruction of the enhanced signal.
Noise Power Spectrum • An initial Noise Power Spectrum and the Noise Power Spectrum Variance (λd) is calculated using the mean values of the FFT for the NISS. • For each frame in the NISS, the Noise Power Spectrum and the Noise Power Spectrum Variance are updated. • The frames after the NISS are evaluated using a Voice Activity Detector (VAD) which utilizes the Noise Power Spectrum. • If the frames are determined to contain only noise, then the Noise Power Spectrum and the Noise Power Spectrum Variance are updated.
Signal to Noise Ratio • Using the Noise Power Spectrum , the a priori SNR (ξk) and the a posteriori SNR (γk) are calculated. • a priori SNR: • γk=Rk2/λd(k) • where Rk is the modulus of the signal plus noise resultant spectral component • a posteriori SNR • ξk(n)=αG2γk(n-1)+(1- α)P [γk(n)-1] • where α = 0.99 and is a smoothing factor. • and G is the Gain Function from the MMSE • and P[x] is defined as x if x>0 or 0 otherwise
Gain Calculation • The gain (G) of the signal is then updated using the Signal to Noise Ratios. • G= ξk/(1- ξk)e(η/2) • Where η= λdξk/(1- ξk)
Signal Enhancement and Reconstruction • The signal is then cleaned by combining the FFT of each frame with the gain. • The signal is reconstructed using the overlap add method utilizing the phase of the FFT.
References • Ephraim, Yariv. “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 32, No. 6, December 1984