Multiplicative Update of AR gains in Codebook-driven Speech Enhancement

Multiplicative Update of AR gains in Codebook-driven Speech Enhancement Qi He1, Changchun Bao1, and Feng Bao2 1Beijing University of Technology, China 2The University of Auckland, New Zealand 2016-3-25

Outline • Speech Enhancement Review • Background • Traditional Methods • Multiplicative Update of AR Gains in Codebook-driven Speech Enhancement • Estimation of spectral shape of noise • Estimation of AR gains • Bayesian MMSE estimation • Codebook-driven Wiener filter • Experimental Results http://www.bjut.edu.cn/sci/voice/index.htm

Background Noises exist everywhere Street noise Factory noise Office noise Babble noise http://www.bjut.edu.cn/sci/voice/index.htm

Background • Speech enhancement applications Mobile phone/ Communication Robust speech / speaker/ language recognition, etc. Hearing aids http://www.bjut.edu.cn/sci/voice/index.htm

where n is the frame index. (1) Background Speech • suppressing the noise in noisy speech • improving the quality and intelligibility of enhanced speech Noisy Enhanced speech Speech enhancement Noise • Speech enhancement aims at http://www.bjut.edu.cn/sci/voice/index.htm

Traditional Methods No a Priori Information • Performance of these methods: • For Stationary Noises: Good • For Non-Stationary Noises: Bad • Traditional speech enhancement methods • Spectral subtraction • Wiener filtering • Subspace method • …… http://www.bjut.edu.cn/sci/voice/index.htm

Traditional Methods • Codebook-based methods • Codebook-based method using ML estimator[1]. • Codebook-based method using Bayesian MMSE estimator[2]. Speech corpus Noise corpus AR: auto-regressive Speech codebook Noise codebook Noisy speech AR gains estimate ML or Bayesian MMSE estimation Noisy spectrum FFT Enhanced speech Wiener filter IFFT http://www.bjut.edu.cn/sci/voice/index.htm

Traditional Methods • Traditional method for AR gains estimation For each pair of code-words from speech and noise codebooks, the corresponding AR gains should be obtained by (2) with observed noisy spectrum modeled noisy spectrum http://www.bjut.edu.cn/sci/voice/index.htm

Traditional Methods • Traditional method for AR gains estimation Since there is no closed-form solution for optimal speech and noise AR gains estimation, the conventional codebook-driven methods indirectly obtain the AR gain estimation based on the log-spectral (LS) distortion, which has a closed-form solution by applying the series expansion. That is (3) with http://www.bjut.edu.cn/sci/voice/index.htm

Traditional Methods • Traditional method for AR gains estimation By differentiating Eq.3 with respect to the AR gains and setting the results to zero, the AR gains can be calculated by (4) After getting the AR gains corresponding to each code-word combination, we can use the following ML estimator or Bayesian MMSE estimator to obtain the AR parameters of speech and noise (5) http://www.bjut.edu.cn/sci/voice/index.htm

Traditional Methods • Traditional method for de-noising A Wiener filter constructed by the estimated AR parameters of speech and noise is used to enhance noisy speech. (6) Although the codebook-driven speech enhancement methods are more suitable for eliminating non-stationary noise, there are still some problems to be addressed. • Noise classification; • The accuracy of gain estimation can be further improved; • The residual noise between the harmonics of noisy speech should be further suppressed ; http://www.bjut.edu.cn/sci/voice/index.htm

Proposed Method • The estimation of spectral shape of noise To solve the problem of noise classification, the spectral shape of noise is estimated online by the Minima Controlled Recursive Averaging(MCRA) algorithm in the proposed method. (7) (8) http://www.bjut.edu.cn/sci/voice/index.htm

Proposed Method • The estimation of AR gains In this paper, we use a multiplicative update rule[3-4] to obtain approximately closed-form solution of IS distortion. Since we only train the shape codebook of speech spectrum offline and the spectral shape of noise is estimated online, for each speech code-word, we can rewrite the modeled noisy spectrum as follows (9) By expressing the Eq.9 in matrix form, we can get: with http://www.bjut.edu.cn/sci/voice/index.htm

Proposed Method • The estimation of AR gains The IS distortion is rewritten as (10) By differentiating Eq.10 with respect to gain matrices, we have [3-4]： (11) The symbol ‘ .’ indicates the point-wise multiplication. By simplifying the above formula, we can get: (12) http://www.bjut.edu.cn/sci/voice/index.htm

Proposed Method • The estimation of AR gains The and are obtained by iterating the following multiplicative rules to minimize the IS distortion: (13) Then we have (14) http://www.bjut.edu.cn/sci/voice/index.htm

Proposed Method • The estimation of AR gains An example of average IS distortion is illustrated in Fig.1. The average IS distortion is defined as follows The Nx is the size of speech codebook. The AR gains are estimated by the conventional and proposed methods, respectively. The speech material is corrupted by white noise with the SNR of 5dB. Fig. 1 the average IS distortion comparison http://www.bjut.edu.cn/sci/voice/index.htm

Proposed Method • Bayesian MMSE estimation Let θx denote the random variable corresponding to the speech AR coefficients. And let gx and gw denote the random variables corresponding to the speech and noise AR gains, respectively. Let θ=[θx, gx, gw] denote the set of random variables. After getting each , the desired Bayesian MMSE estimate can be written as follows (15) with and http://www.bjut.edu.cn/sci/voice/index.htm

Proposed Method Fig. 2 AR gain estimation of clean speech

Proposed Method • Modified codebook-driven Wiener filter Conventional codebook-driven Wiener filter is constructed by the estimated spectral envelopes of speech and noise, which usually causes an inaccurate fitting for the spectra between the harmonics of speech. Consequently, the residual noise still remains between the harmonics of the enhanced speech. In this section, we introduce the SPP to modify the traditional codebook-driven Wiener filter for suppressing the residual noise. (16) with and where http://www.bjut.edu.cn/sci/voice/index.htm

Performance Evaluation • Experiments Four types of noise: white, babble, office, and street The test materials : 9 utterances from 4 female speakers and 5 male speakers. The sampling rate: 8KHz The size of speech codebook: 6bit TABLE.1.TEST RESULTS OF PESQ http://www.bjut.edu.cn/sci/voice/index.htm

Performance Evaluation TABLE.2. TEST RESULTS OF SSNR IMPROVEMENT TABLE.3. TEST RESULTS OF LSD http://www.bjut.edu.cn/sci/voice/index.htm

Demos clean speech noisy speech (white noise, SNR=10dB), enhanced speech using ML-CB, enhanced speech using MMSE-CB, enhanced speech using our method without SPP, enhanced speech using our method with SPP. http://www.bjut.edu.cn/sci/voice/index.htm

Demos clean speech noisy speech (babble noise, SNR=10dB), enhanced speech using ML-CB, enhanced speech using MMSE-CB, enhanced speech using our method without SPP, enhanced speech using our method with SPP. http://www.bjut.edu.cn/sci/voice/index.htm

References [1] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook driven short-term predictor parameter estimation for speech enhancement”, IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 163–176, Jan.2006. [2] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, “Codebook-based Bayesian speech enhancement for nonstationary environments” , IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 441–452,Feb. 2007 [3] Daniel D. Lee and H. Sebastian Seung, “Algorithms for non-negative matrix factorization,” in NIPS, 2000, pp. 556–562. [4] C. Févotte, N. Bertin, and J. L. Durrieu, “Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis,” Neural Comput., vol. 21, pp. 793–830, 2009. http://www.bjut.edu.cn/sci/voice/index.htm

Thank You ! Q & A

Multiplicative Update of AR gains in Codebook-driven Speech Enhancement

Multiplicative Update of AR gains in Codebook-driven Speech Enhancement

Presentation Transcript

Speech Enhancement with Binaural Cues Derived from a Priori Codebook

Wavelet-Based Speech Enhancement

Subspace Methods for Speech Enhancement

Speech Enhancement

Enhancement Fund Update

Data-Driven Enhancement of Facial Attractiveness

Advanced Speech Enhancement in Noisy Environments

AR Enhancement Discussion

Bayesian Enhancement of Speech Signals

Speech Enhancement Using Spectral Subtraction

Wavelet-Based Speech Enhancement

Query-driven dictionary enhancement

Bayesian Methods for Speech Enhancement

Multiplicative Comparison

Speech Enhancement for ASR

Wearable Speech Enhancement

Multiplicative Weights Update Method

Codebook

Speech Enhancement through Noise Reduction

Speech Enhancement

Signal Subspace Speech Enhancement

Wavelet-Based Speech Enhancement