SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Emad M. Grais HakanErdogan 17th International Conference on Digital Signal Processing,2011 Jain-De,Lee

Outline • INTRODUCTION • NON-NEGATIVE MATRIX FACTORIZATION • SIGNAL SEPARATION AND MASKING • EXPERIMENTS AND DISCUSSION • CONCLUSION

Introduction • There are two main stages of this work • Training stage • Separation stage • Using NMF with different types of masks to improve the separation process • The separation process faster • NMF with fewer iterations

Introduction • Problem formulation • The observe a signal x(t) ,which is the mixture of two sources s(t) and m(t) • Assume the sources have the same phase angle as the mixed Where (t , f) be the STFT of x(t) X=S+M

Non-negative Matrix Factorization • Non-negative matrix factorizationalgorithm • Minimization problem • Different cost functionsCof NMF • Euclidean distance • KL divergence subject to elements ofB,W≧0

Non-negative Matrix Factorization • The magnitude spectrogram S and M are calculated by NMF • Larger number of basis vectors • Lower approximation error • Redundant set of basis • Require more computation time

Signal Separation and Masking • The NMF is used decompose the magnitude spectrogram matrix X • The initial spectrograms estimates for speech and music signals are respectively calculated as follows Where WS and WM are submatrices in matrix W

Signal Separation and Masking • Use the initial estimated spectrograms and to build a mask as follows • Source signals reconstruction Where1 is a matrix of ones is element-wisemultiplication

Signal Separation and Masking • Two specific values of p correspond to special masks • Wiener filter(soft mask) • Hard mask

Signal Separation and Masking The value of the mask versus the linear ratio for different values of p

Experiments and Discussion • Simulation • 16kHz sampling rate • Speech • Training speech data-540 short utterances • Testing speech data-20 utterances • Music • 38 pieces for training • one piece for testing • Hamming window-512 point • FFT size-512 point

Experiments and Discussion

Conclusion • The family of masks have a parameter to control the saturation level • The proposed algorithm gives better results and facilitates to speed up the separation process

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

Presentation Transcript

Evaluating Speech Separation with a Speech Recognizer

Blind Single Channel Speech Separation by Spectrogram Masking

Single-Channel Audio Source Separation based on Probabilistic Latent Variable Decomposition

Spectral Masks for 11ah

Speech Enhancement Using Spectral Subtraction

from Single Channel and Two-Channel Data

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS

A Single-channel Mix Signal Separation Technique

Speech and Music Retrieval

Separation of Multispeaker Speech Using Excitation Information

Speech and Music Retrieval

Single and Multi Channel Feature Enhancement for Distant Speech Recognition

Learning Spectral Clustering, With Application to Speech Separation

Single-cartridge gas masks

Channel Spectral Characteristics

Combinatorial Spectral Theory of Nonnegative Matrices

Speech and Music Retrieval

Speech and Music Retrieval

Single Channel Micropipette