Speech Enhancement

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method

Process Flow

Segmenting of Signal • The sample is divided into frames whose length is equal to 25ms with a shift percentage of 40% or 10ms. • The Window Length is equal to the 25ms times the Sampling Frequency. • Example • Sampling Frequency is equal to 8000 samples/s • Window Length = 0.025s * 8000 samples/s = 200 samples • Each frame is then windowed using a Hamming window.

Initial Silence Segments • The initial silence or speech inactivity period is assumed to be 250ms. • This is to allow for a sufficient amount of data to be analyzed for the Noise Spectrum prior to attempting Voice Activity Detection (VAD). • The Number of Initial Silence Segments (NISS) = (Initial Silence * Sampling Frequency - Window Length)/(Shift Percentage* Window Length). • Example • Using our previous values. • NISS = (0.25s * 8000 samples/s - 200 samples)/0.4*200 samples = 22.5. • The value is rounded down to the nearest whole number

Phase Calculation using FFT • The Fast Fourier Transform of each frame is calculated. • The phase component of the FFT is calculated for use in reconstruction of the enhanced signal.

Noise Power Spectrum • An initial Noise Power Spectrum and the Noise Power Spectrum Variance (λd) is calculated using the mean values of the FFT for the NISS. • For each frame in the NISS, the Noise Power Spectrum and the Noise Power Spectrum Variance are updated. • The frames after the NISS are evaluated using a Voice Activity Detector (VAD) which utilizes the Noise Power Spectrum. • If the frames are determined to contain only noise, then the Noise Power Spectrum and the Noise Power Spectrum Variance are updated.

Signal to Noise Ratio • Using the Noise Power Spectrum , the a priori SNR (ξk) and the a posteriori SNR (γk) are calculated. • a priori SNR: • γk=Rk2/λd(k) • where Rk is the modulus of the signal plus noise resultant spectral component • a posteriori SNR • ξk(n)=αG2γk(n-1)+(1- α)P [γk(n)-1] • where α = 0.99 and is a smoothing factor. • and G is the Gain Function from the MMSE • and P[x] is defined as x if x>0 or 0 otherwise

Gain Calculation • The gain (G) of the signal is then updated using the Signal to Noise Ratios. • G= ξk/(1- ξk)e(η/2) • Where η= λdξk/(1- ξk)

Signal Enhancement and Reconstruction • The signal is then cleaned by combining the FFT of each frame with the gain. • The signal is reconstructed using the overlap add method utilizing the phase of the FFT.

Sample – Hair Dryer Background

Sample – Jack Hammer Background

Sample – Air Conditioner Background

Sample – Cafeteria Background

Sample – Automobile Background

Sample – Coffee Grinder Background

Sample – Fan Background

Sample – Feedback Background

Sample – White Noise Background

Sample – Static Background

References • Ephraim, Yariv. “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 32, No. 6, December 1984

Speech Enhancement

Speech Enhancement

Presentation Transcript

Speech Enhancement with Binaural Cues Derived from a Priori Codebook

Nearfield Spherical Microphone Arrays for speech enhancement and dereverberation

Wavelet-Based Speech Enhancement

Subspace Methods for Speech Enhancement

Noise Supression Techniques for Speech Enhancement Using Adaptive Filtering

Automotive Speech Enhancement of Today: Applications, Challenges and Solutions

Advanced Speech Enhancement in Noisy Environments

Bayesian Enhancement of Speech Signals

Speech Enhancement Using Spectral Subtraction

SEQUENTIAL STATE-SPACE FILTERS FOR SPEECH ENHANCEMENT

Wavelet-Based Speech Enhancement

Speech Enhancement EE 516 Spring 2009

A Tutorial on Bayesian Speech Feature Enhancement

Refinement in FTLP-HNM system for Speech Enhancement

Speech Enhancement using Excitation Source Information

Bayesian Methods for Speech Enhancement

Speech enhancement in nonstationary noise environments using noise properties

Speech Enhancement for ASR

Wearable Speech Enhancement

Speech Enhancement through Noise Reduction

Speech Enhancement

Signal Subspace Speech Enhancement