10 likes | 102 Views
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanc i University, Istanbul, Turkey. INTRODUCTION.
E N D
SPECTRO-TEMPORALPOST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and HakanErdogan SabanciUniversity, Istanbul, Turkey INTRODUCTION • We compare enforcing temporal smoothness by using post-smoothed spectral masks with enforcing smoothness by using regularized NMF. • The regularized NMF is defined as • Where Bd= [Bspeech, Bmusic], αis the regularization parameter, and R(G) is the continuity prior penalty term defined as: • Where • In this work, we choose different αs values for speech and αmfor music. • Table 1 shows the separation results using the regularized NMF to enforce smoothness on the estimated source signals. • Tables 2 and 3 show the separation results where the smoothness is enforced using smoothed spectral masks. • The tables show that, enforcing smoothness using smoothed masks gives better separation results than enforcing smoothness using regularized NMF. NONNEGATIVE MATRIX FACTORIZATION • Single-channel source separation aims to find estimates of source signals that are mixed when a single mixture is available. SIGNALSRECONSTRUCTIONAND SMOOTHED MASKS • NMF is used to decompose a nonnegative matrix V into a low rank nonnegative basis vectors matrix B and a nonnegative weights matrixG. • B and G can be found by minimizing the generalized Kullback-Leibler divergence • Subject to elements of . • The update solutions of B and G are • The initial estimates are used to build a spectral mask as • Changing p leads to different type of mask. • The spectral mask can be used to find estimate for each source by element-wise multiplication with the spectrogram of the mixed signal as. • To add temporal smoothness to the estimated source signal spectrograms, the spectral mask is smoothed by a 2-D smoothing filter with dimensions (a,b) as • The is a smoothing filter, which can be • The median filter. • The moving average low pass filter. • The Hamming windowed moving average filter (Hamming filter). • The smoothing direction is the horizontal (time) direction of the spectrograms. • The final estimate for each source can be found as 1 PROBLEM FORMULATION • The observed mixed signal x(t) is a mixture of multi-source signals sz(t). • This can be formed in the short time Fourier transform (STFT) domain as • This can be approximated as a sum of magnitude spectrograms as • The magnitude spectrograms can be written as nonnegative matrices as 2 NMF FOR SOURCE SEPARATION • In training stage: • Magnitude spectrogram of each source training data is used to build dictionary Bz for each source using NMF. • In testing stage: • NMF is used to decompose the magnitude spectrogram of the mixed signal X into a nonnegative weighted linear combinations of the trained dictionaries as • The initial estimate for each source is found as: 7 5 EXPERIMENTS AND RESULTS • The proposed algorithm is used to separate a speech signal from a background piano music signal. • For STFT, 512-point FFT, first 257 points are only used , the sampling rate is 16kHz. • We train 128 basis vectors for each source dictionary, so the size of each matrix B is 257x128. • WE CAN ADD SOMETHING HERE 6 3 4