1 / 25

Multiplicative Update of AR gains in Codebook-driven Speech Enhancement

Multiplicative Update of AR gains in Codebook-driven Speech Enhancement. Qi He 1 , Changchun Bao 1 , and Feng Bao 2 1 Beijing University of Technology, China 2 The University of Auckland, New Zealand 2016-3-25. Outline. Speech Enhancement Review Background Traditional Methods

Download Presentation

Multiplicative Update of AR gains in Codebook-driven Speech Enhancement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiplicative Update of AR gains in Codebook-driven Speech Enhancement Qi He1, Changchun Bao1, and Feng Bao2 1Beijing University of Technology, China 2The University of Auckland, New Zealand 2016-3-25

  2. Outline • Speech Enhancement Review • Background • Traditional Methods • Multiplicative Update of AR Gains in Codebook-driven Speech Enhancement • Estimation of spectral shape of noise • Estimation of AR gains • Bayesian MMSE estimation • Codebook-driven Wiener filter • Experimental Results http://www.bjut.edu.cn/sci/voice/index.htm

  3. Background Noises exist everywhere Street noise Factory noise Office noise Babble noise http://www.bjut.edu.cn/sci/voice/index.htm

  4. Background • Speech enhancement applications Mobile phone/ Communication Robust speech / speaker/ language recognition, etc. Hearing aids http://www.bjut.edu.cn/sci/voice/index.htm

  5. where n is the frame index. (1) Background Speech • suppressing the noise in noisy speech • improving the quality and intelligibility of enhanced speech Noisy Enhanced speech Speech enhancement Noise • Speech enhancement aims at http://www.bjut.edu.cn/sci/voice/index.htm

  6. Traditional Methods No a Priori Information • Performance of these methods: • For Stationary Noises: Good • For Non-Stationary Noises: Bad • Traditional speech enhancement methods • Spectral subtraction • Wiener filtering • Subspace method • …… http://www.bjut.edu.cn/sci/voice/index.htm

  7. Traditional Methods • Codebook-based methods • Codebook-based method using ML estimator[1]. • Codebook-based method using Bayesian MMSE estimator[2]. Speech corpus Noise corpus AR: auto-regressive Speech codebook Noise codebook Noisy speech AR gains estimate ML or Bayesian MMSE estimation Noisy spectrum FFT Enhanced speech Wiener filter IFFT http://www.bjut.edu.cn/sci/voice/index.htm

  8. Traditional Methods • Traditional method for AR gains estimation For each pair of code-words from speech and noise codebooks, the corresponding AR gains should be obtained by (2) with observed noisy spectrum modeled noisy spectrum http://www.bjut.edu.cn/sci/voice/index.htm

  9. Traditional Methods • Traditional method for AR gains estimation Since there is no closed-form solution for optimal speech and noise AR gains estimation, the conventional codebook-driven methods indirectly obtain the AR gain estimation based on the log-spectral (LS) distortion, which has a closed-form solution by applying the series expansion. That is (3) with http://www.bjut.edu.cn/sci/voice/index.htm

  10. Traditional Methods • Traditional method for AR gains estimation By differentiating Eq.3 with respect to the AR gains and setting the results to zero, the AR gains can be calculated by (4) After getting the AR gains corresponding to each code-word combination, we can use the following ML estimator or Bayesian MMSE estimator to obtain the AR parameters of speech and noise (5) http://www.bjut.edu.cn/sci/voice/index.htm

  11. Traditional Methods • Traditional method for de-noising A Wiener filter constructed by the estimated AR parameters of speech and noise is used to enhance noisy speech. (6) Although the codebook-driven speech enhancement methods are more suitable for eliminating non-stationary noise, there are still some problems to be addressed. • Noise classification; • The accuracy of gain estimation can be further improved; • The residual noise between the harmonics of noisy speech should be further suppressed ; http://www.bjut.edu.cn/sci/voice/index.htm

  12. Proposed Method • The estimation of spectral shape of noise To solve the problem of noise classification, the spectral shape of noise is estimated online by the Minima Controlled Recursive Averaging(MCRA) algorithm in the proposed method. (7) (8) http://www.bjut.edu.cn/sci/voice/index.htm

  13. Proposed Method • The estimation of AR gains In this paper, we use a multiplicative update rule[3-4] to obtain approximately closed-form solution of IS distortion. Since we only train the shape codebook of speech spectrum offline and the spectral shape of noise is estimated online, for each speech code-word, we can rewrite the modeled noisy spectrum as follows (9) By expressing the Eq.9 in matrix form, we can get: with http://www.bjut.edu.cn/sci/voice/index.htm

  14. Proposed Method • The estimation of AR gains The IS distortion is rewritten as (10) By differentiating Eq.10 with respect to gain matrices, we have [3-4]: (11) The symbol ‘ .’ indicates the point-wise multiplication. By simplifying the above formula, we can get: (12) http://www.bjut.edu.cn/sci/voice/index.htm

  15. Proposed Method • The estimation of AR gains The and are obtained by iterating the following multiplicative rules to minimize the IS distortion: (13) Then we have (14) http://www.bjut.edu.cn/sci/voice/index.htm

  16. Proposed Method • The estimation of AR gains An example of average IS distortion is illustrated in Fig.1. The average IS distortion is defined as follows The Nx is the size of speech codebook. The AR gains are estimated by the conventional and proposed methods, respectively. The speech material is corrupted by white noise with the SNR of 5dB. Fig. 1 the average IS distortion comparison http://www.bjut.edu.cn/sci/voice/index.htm

  17. Proposed Method • Bayesian MMSE estimation Let θx denote the random variable corresponding to the speech AR coefficients. And let gx and gw denote the random variables corresponding to the speech and noise AR gains, respectively. Let θ=[θx, gx, gw] denote the set of random variables. After getting each , the desired Bayesian MMSE estimate can be written as follows (15) with and http://www.bjut.edu.cn/sci/voice/index.htm

  18. Proposed Method Fig. 2 AR gain estimation of clean speech

  19. Proposed Method • Modified codebook-driven Wiener filter Conventional codebook-driven Wiener filter is constructed by the estimated spectral envelopes of speech and noise, which usually causes an inaccurate fitting for the spectra between the harmonics of speech. Consequently, the residual noise still remains between the harmonics of the enhanced speech. In this section, we introduce the SPP to modify the traditional codebook-driven Wiener filter for suppressing the residual noise. (16) with and where http://www.bjut.edu.cn/sci/voice/index.htm

  20. Performance Evaluation • Experiments Four types of noise: white, babble, office, and street The test materials : 9 utterances from 4 female speakers and 5 male speakers. The sampling rate: 8KHz The size of speech codebook: 6bit TABLE.1.TEST RESULTS OF PESQ http://www.bjut.edu.cn/sci/voice/index.htm

  21. Performance Evaluation TABLE.2. TEST RESULTS OF SSNR IMPROVEMENT TABLE.3. TEST RESULTS OF LSD http://www.bjut.edu.cn/sci/voice/index.htm

  22. Demos clean speech noisy speech (white noise, SNR=10dB), enhanced speech using ML-CB, enhanced speech using MMSE-CB, enhanced speech using our method without SPP, enhanced speech using our method with SPP. http://www.bjut.edu.cn/sci/voice/index.htm

  23. Demos clean speech noisy speech (babble noise, SNR=10dB), enhanced speech using ML-CB, enhanced speech using MMSE-CB, enhanced speech using our method without SPP, enhanced speech using our method with SPP. http://www.bjut.edu.cn/sci/voice/index.htm

  24. References [1] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement”, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan.2006. [2] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook-based Bayesian speech enhancement for nonstationary environments” , IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 441–452,Feb. 2007 [3] Daniel D. Lee and H. Sebastian Seung, “Algorithms for non-negative matrix factorization,” in NIPS, 2000, pp. 556–562. [4] C. Févotte, N. Bertin, and J. L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis,” Neural Comput., vol. 21, pp. 793–830, 2009. http://www.bjut.edu.cn/sci/voice/index.htm

  25. Thank You ! Q & A

More Related