Advanced Speech Enhancement in Noisy Environments

Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering q.zhu@strath.ac.uk

Presentation structure • Introduction • Speech Enhancement • Improved Minima Controlled Recursive Averaging (IMCRA) • Robust Voice Activity Detection (VAD) • 1-D Local Binary Pattern (LBP) • 1-D LBP of energy based VAD • Performance Evaluation • Improved IMCRA • Performance Evaluation • Discussion & Conclusion

Introduction • Automatic speech recognition (ASR) • Speech recognition system aims to create intelligent machines that can ‘hear’, ‘understand’ and ‘comply’ to speech input. • Speech enhancement and VAD are applied as the integral parts in ASR system. • Aim of current research • Improve the recognition system performance in babble noisy background.

IMCRA • IMCRA: IMCRA Processing * Israel Cohen, ‘Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging.’ (IEEE Tran. On speech and audio, 2003)

IMCRA with babble • IMCRA Performance • Clean Signal: • Noisy Signal at 0 dB: • Enhanced by IMCRA:

1-D LBP • 2-D LBP • Extensively used in 2-D image processing • 1-D LBP • Used for 1-D signal processing (NavinChatlani, EUSIPCO 2010, Qiming Zhu, EUSIPCO 2012) • LBP Code Calculation: where P is the number of neighbouring samples used. The Sign function is: • On-set detection of Myoelectric signal (Paul McCool, EUSIPCO 2012)

1-D LBP code • 1-D LBP calculate the LBP code after thresholding the neighbour samples. LBP code calculation for p=8 *NavinChatlaniet al, ‘Local binary patterns for 1-D signal processing’, (EUSIPCO 2010)

1-D LBP histogram • The distribution of the LBP codes can perform a histogram to describe the continuous signal with the window size of N: where and B is the number of histogram bins.is Kronecker Delta function. 1-D LBP perform the Histogram with the window data Overview of 1-D LBP procedure on a histogram

1-D LBP of energy • Short-time energy and the histogram Speech Signals and the Short-time Energy a) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy.

1-D LBP of energy with offset value • LBP code with offset values of the Energy with Different offset value a) of noisy signal, b) , c) , d) , e), f)

1-D LBP of energy based VAD • System block diagram VAD block diagram

VAD performance • Experimental background • Test speech sampling frequency is 16 kHz.The total length of the test set used is 73 seconds. Mixed with babble noise from 0-20 dB. set to be 0.03. • VAD 1: 1-D LBP of energy based VAD. • VAD 0: VAD proposed by NavinChatlani. • G.729: G.729 B Standard VAD. • HR0: Speech absence hit-rate: • FAR0: Speech absence false alarm rate:

VAD performance • VAD performance VAD performance

Improved IMCRA • Experimental background • 198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz. • Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB. • Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms. • Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance. *Klattet al, ‘Prediction in perceived phonetic distance from critical band spectra’, IEEE Conference on Acoustics, 1982

Improved IMCRA with babble noise • Performance • Clean signal: • Noisy signal ( SNR at 0 dB): • IMCRA: • Improved IMCRA:

Improved IMCRA with babble noise • Performance Segmental SNR

Improved IMCRA with babble noise • Performance Weighted spectrum slope

Discussion • Conclusion for the results • 1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals. • LBP in energy domain is shown to be superior to the G.729 VAD and Navin’s LBP VAD. • Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood. • Future work • Applied this algorithm as the pre-processing of a ASR system.

Acknowledge • Thank Prof. John Soraghan for the idea of babble noise reduction. • Thank Paul and Navin for the previous work on 1-D LBP.

Thank you!Any Question?

Advanced Speech Enhancement in Noisy Environments

Advanced Speech Enhancement in Noisy Environments

Presentation Transcript

Wavelet-Based Speech Enhancement

Multipitch Tracking for Noisy Speech

Subspace Methods for Speech Enhancement

Advanced Collaborative Environments

Speech Enhancement

What can we expect of cochlear implants for listening to speech in noisy environments ?

Distant Speech Recognition in Smart Homes Initiated by Hand Clapping within Noisy Environments .

Speech Enhancement Using Spectral Subtraction

Wavelet-Based Speech Enhancement

Distant Speech Recognition in Smart Homes Initiated by Hand Clapping within Noisy Environments .

Speech Recognition in Adverse Environments

Enhancement of Speech in Noisy Conditions Project Presentation Paul Coffey

Formant Track Restoration in Train Noisy Speech

Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments

Speech enhancement in nonstationary noise environments using noise properties

Speech Enhancement for ASR

Wearable Speech Enhancement

Detection and Segmentation of Bird Song in Noisy Environments

Speech Enhancement

Improved Hearing Assessment in Noisy Environments

Signal Subspace Speech Enhancement

Wavelet-Based Speech Enhancement