220 likes | 476 Views
Advanced Speech Enhancement in Noisy Environments. Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering q.zhu@strath.ac.uk. Presentation structure. Introduction Speech Enhancement
E N D
Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering q.zhu@strath.ac.uk
Presentation structure • Introduction • Speech Enhancement • Improved Minima Controlled Recursive Averaging (IMCRA) • Robust Voice Activity Detection (VAD) • 1-D Local Binary Pattern (LBP) • 1-D LBP of energy based VAD • Performance Evaluation • Improved IMCRA • Performance Evaluation • Discussion & Conclusion
Introduction • Automatic speech recognition (ASR) • Speech recognition system aims to create intelligent machines that can ‘hear’, ‘understand’ and ‘comply’ to speech input. • Speech enhancement and VAD are applied as the integral parts in ASR system. • Aim of current research • Improve the recognition system performance in babble noisy background.
IMCRA • IMCRA: IMCRA Processing * Israel Cohen, ‘Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging.’ (IEEE Tran. On speech and audio, 2003)
IMCRA with babble • IMCRA Performance • Clean Signal: • Noisy Signal at 0 dB: • Enhanced by IMCRA:
1-D LBP • 2-D LBP • Extensively used in 2-D image processing • 1-D LBP • Used for 1-D signal processing (NavinChatlani, EUSIPCO 2010, Qiming Zhu, EUSIPCO 2012) • LBP Code Calculation: where P is the number of neighbouring samples used. The Sign function is: • On-set detection of Myoelectric signal (Paul McCool, EUSIPCO 2012)
1-D LBP code • 1-D LBP calculate the LBP code after thresholding the neighbour samples. LBP code calculation for p=8 *NavinChatlaniet al, ‘Local binary patterns for 1-D signal processing’, (EUSIPCO 2010)
1-D LBP histogram • The distribution of the LBP codes can perform a histogram to describe the continuous signal with the window size of N: where and B is the number of histogram bins.is Kronecker Delta function. 1-D LBP perform the Histogram with the window data Overview of 1-D LBP procedure on a histogram
1-D LBP of energy • Short-time energy and the histogram Speech Signals and the Short-time Energy a) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy.
1-D LBP of energy with offset value • LBP code with offset values of the Energy with Different offset value a) of noisy signal, b) , c) , d) , e), f)
1-D LBP of energy based VAD • System block diagram VAD block diagram
VAD performance • Experimental background • Test speech sampling frequency is 16 kHz.The total length of the test set used is 73 seconds. Mixed with babble noise from 0-20 dB. set to be 0.03. • VAD 1: 1-D LBP of energy based VAD. • VAD 0: VAD proposed by NavinChatlani. • G.729: G.729 B Standard VAD. • HR0: Speech absence hit-rate: • FAR0: Speech absence false alarm rate:
VAD performance • VAD performance VAD performance
Improved IMCRA • Experimental background • 198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz. • Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB. • Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms. • Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance. *Klattet al, ‘Prediction in perceived phonetic distance from critical band spectra’, IEEE Conference on Acoustics, 1982
Improved IMCRA with babble noise • Performance • Clean signal: • Noisy signal ( SNR at 0 dB): • IMCRA: • Improved IMCRA:
Improved IMCRA with babble noise • Performance Segmental SNR
Improved IMCRA with babble noise • Performance Weighted spectrum slope
Discussion • Conclusion for the results • 1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals. • LBP in energy domain is shown to be superior to the G.729 VAD and Navin’s LBP VAD. • Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood. • Future work • Applied this algorithm as the pre-processing of a ASR system.
Acknowledge • Thank Prof. John Soraghan for the idea of babble noise reduction. • Thank Paul and Navin for the previous work on 1-D LBP.