1 / 20

Advanced Speech Enhancement in Noisy Environments

Advanced Speech Enhancement in Noisy Environments. Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering q.zhu@strath.ac.uk. Presentation structure. Introduction Speech Enhancement

dusty
Download Presentation

Advanced Speech Enhancement in Noisy Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering q.zhu@strath.ac.uk

  2. Presentation structure • Introduction • Speech Enhancement • Improved Minima Controlled Recursive Averaging (IMCRA) • Robust Voice Activity Detection (VAD) • 1-D Local Binary Pattern (LBP) • 1-D LBP of energy based VAD • Performance Evaluation • Improved IMCRA • Performance Evaluation • Discussion & Conclusion

  3. Introduction • Automatic speech recognition (ASR) • Speech recognition system aims to create intelligent machines that can ‘hear’, ‘understand’ and ‘comply’ to speech input. • Speech enhancement and VAD are applied as the integral parts in ASR system. • Aim of current research • Improve the recognition system performance in babble noisy background.

  4. IMCRA • IMCRA: IMCRA Processing * Israel Cohen, ‘Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging.’ (IEEE Tran. On speech and audio, 2003)

  5. IMCRA with babble • IMCRA Performance • Clean Signal: • Noisy Signal at 0 dB: • Enhanced by IMCRA:

  6. 1-D LBP • 2-D LBP • Extensively used in 2-D image processing • 1-D LBP • Used for 1-D signal processing (NavinChatlani, EUSIPCO 2010, Qiming Zhu, EUSIPCO 2012) • LBP Code Calculation: where P is the number of neighbouring samples used. The Sign function is: • On-set detection of Myoelectric signal (Paul McCool, EUSIPCO 2012)

  7. 1-D LBP code • 1-D LBP calculate the LBP code after thresholding the neighbour samples. LBP code calculation for p=8 *NavinChatlaniet al, ‘Local binary patterns for 1-D signal processing’, (EUSIPCO 2010)

  8. 1-D LBP histogram • The distribution of the LBP codes can perform a histogram to describe the continuous signal with the window size of N: where and B is the number of histogram bins.is Kronecker Delta function. 1-D LBP perform the Histogram with the window data Overview of 1-D LBP procedure on a histogram

  9. 1-D LBP of energy • Short-time energy and the histogram Speech Signals and the Short-time Energy a) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy.

  10. 1-D LBP of energy with offset value • LBP code with offset values of the Energy with Different offset value a) of noisy signal, b) , c) , d) , e), f)

  11. 1-D LBP of energy based VAD • System block diagram VAD block diagram

  12. VAD performance • Experimental background • Test speech sampling frequency is 16 kHz.The total length of the test set used is 73 seconds. Mixed with babble noise from 0-20 dB. set to be 0.03. • VAD 1: 1-D LBP of energy based VAD. • VAD 0: VAD proposed by NavinChatlani. • G.729: G.729 B Standard VAD. • HR0: Speech absence hit-rate: • FAR0: Speech absence false alarm rate:

  13. VAD performance • VAD performance VAD performance

  14. Improved IMCRA • Experimental background • 198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz. • Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB. • Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms. • Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance. *Klattet al, ‘Prediction in perceived phonetic distance from critical band spectra’, IEEE Conference on Acoustics, 1982

  15. Improved IMCRA with babble noise • Performance • Clean signal: • Noisy signal ( SNR at 0 dB): • IMCRA: • Improved IMCRA:

  16. Improved IMCRA with babble noise • Performance Segmental SNR

  17. Improved IMCRA with babble noise • Performance Weighted spectrum slope

  18. Discussion • Conclusion for the results • 1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals. • LBP in energy domain is shown to be superior to the G.729 VAD and Navin’s LBP VAD. • Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood. • Future work • Applied this algorithm as the pre-processing of a ASR system.

  19. Acknowledge • Thank Prof. John Soraghan for the idea of babble noise reduction. • Thank Paul and Navin for the previous work on 1-D LBP.

  20. Thank you!Any Question?

More Related