160 likes | 257 Views
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS. Emad M. Grais. Hakan Erdogan. 17 th International Conference on Digital Signal Processing,2011. Jain- De,Lee. Outline. INTRODUCTION NON-NEGATIVE MATRIX FACTORIZATION
E N D
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Emad M. Grais HakanErdogan 17th International Conference on Digital Signal Processing,2011 Jain-De,Lee
Outline • INTRODUCTION • NON-NEGATIVE MATRIX FACTORIZATION • SIGNAL SEPARATION AND MASKING • EXPERIMENTS AND DISCUSSION • CONCLUSION
Introduction • There are two main stages of this work • Training stage • Separation stage • Using NMF with different types of masks to improve the separation process • The separation process faster • NMF with fewer iterations
Introduction • Problem formulation • The observe a signal x(t) ,which is the mixture of two sources s(t) and m(t) • Assume the sources have the same phase angle as the mixed Where (t , f) be the STFT of x(t) X=S+M
Non-negative Matrix Factorization • Non-negative matrix factorizationalgorithm • Minimization problem • Different cost functionsCof NMF • Euclidean distance • KL divergence subject to elements ofB,W≧0
Non-negative Matrix Factorization • The magnitude spectrogram S and M are calculated by NMF • Larger number of basis vectors • Lower approximation error • Redundant set of basis • Require more computation time
Signal Separation and Masking • The NMF is used decompose the magnitude spectrogram matrix X • The initial spectrograms estimates for speech and music signals are respectively calculated as follows Where WS and WM are submatrices in matrix W
Signal Separation and Masking • Use the initial estimated spectrograms and to build a mask as follows • Source signals reconstruction Where1 is a matrix of ones is element-wisemultiplication
Signal Separation and Masking • Two specific values of p correspond to special masks • Wiener filter(soft mask) • Hard mask
Signal Separation and Masking The value of the mask versus the linear ratio for different values of p
Experiments and Discussion • Simulation • 16kHz sampling rate • Speech • Training speech data-540 short utterances • Testing speech data-20 utterances • Music • 38 pieces for training • one piece for testing • Hamming window-512 point • FFT size-512 point
Conclusion • The family of masks have a parameter to control the saturation level • The proposed algorithm gives better results and facilitates to speed up the separation process