Generalized Sparse Signal Mixing Model and Application to Noisy Blind Source Separation

Generalized Sparse Signal Mixing Model and Application to Noisy Blind Source Separation Justinian Rosca Christian Borss # Radu Balan Siemens Corporate Research, Princeton, USA # Presently: University of Brawnschweig

sL-1 s2 s1 sL n x1 x2 xD Signal Processor S1,S2 ,…,SL ICA/BSS Scenario L sources D microphones

Motivation • Solve ICA problem in realistic scenarios • In the presence of noise. Is this really feasible? • When A is “fat” (degenerate) • Successful DUET/Time-frequency masking - approach and implementation • Can we do better if we relax the DUET assumption about number of sources “active” at any time-frequency point? [Rickard et al. 2000,2001]

Sparseness in TF • DUET assumption: the maximum number of sources active at any time - frequency point in a mixture of signals is one

Example Voice Signal and TF Representation J.Rosca et al. – Scalable BSS under Noise – DAGA, Aachen 2003 Siemens Corporate Research

s1 s2 s3 Sparseness in TF • Sources hop from one set of frequencies to another over time, with no collisions (at most one source active at any time-freq. point)

s1 s2 s3 Generalized Sparseness in TF • Sources hop from one set of frequencies to another over time, with collisions (at most N sources active at any time-freq. point) N=2, L=3

Assume the TF coefficient S(k,) is modeled as a product of a Bernoulli (0/1) r.v., V, and a continuous r.v. G: The p.d.f. of S becomes: For L independent signals the joint source pdf becomes: The Independence Assumption • W-Disjoint Orthogonality (DUET): q very small→ retain first two • terms; at most one source is active at any time-freq. point • Generalized W-Disj.Orth.: q very small→ retain first N+1 terms; at • most N sources are active at any time-freq. point

1 2 … D Signal Model (1) • Assumptions: • L sources, D sensors • Far-field • Direct-path • Noises iid, Gaussian (0,σ2)

Signal Model (2) • Mixing model: • Source sparseness in TF • Let those be:

Example Let N=2, two sources active at any time-freq. point

BSS Problem • Given measurements {x(t)}1<=t<=T , D sensors • Determine estimate of parameters : • Note: L>D, degenerate BSS problem Mapping and Source signals Mixing parameters

Approach: Two Steps • Estimate mixing parameters, e.g. using the stronger constraint of W-disjoint orthogonality • Estimate the source signals under the generalized W-disjoint orthogonality assumption

Solution Sketch (Ad-Hoc) • Employ principle of coherence (e.g. N=2) • Given a pair of sources Sa and Sb active at some time-freq. point, then what we know what we should measure at all microphones pairs! • Sa and Sb are the true ones if they result in minimum variance across all microphone pairs, i.e. coherent measurements • Note: For N=2 and L=4 there are 6 pair of sources to be tested! (1,2),(1,3),(1,4),(2,3),(2,4),(3,4) i j

Solution Sketch (ML-1) • Maximize likelihood function L(,R)=p(X| ,R)

Solution Sketch (ML-2) • max L(,R), after taking log

Solution Sketch (ML-3) • After substituting R:

CD Interpretation of Solution • Criterion: • projection of X onto the span of columns of M • Solution  (“coherent” measurements) • N-dim subspace of CD closest to X among all L-choose-N subspaces spanned by different combinations of N columns of the matrix M • Existence iff N≤D-1

Experimental Results (1) • Algorithm applied to realistic synthetic mixtures • From anechoic, low echoic, echoic to strongly echoic • 16kHz data, 256 sample window, 50% overlap, coherent noise, SIR (-5dB,10dB), 30 gradient steps/iteration (Step 2), 5 iterations • Evaluation: SIRGain, SegmentalSNR, Distortion

Example Sources L=4, Mics D=2, N=2 Sources Mixing Estimates

Discussion: L=4 sources, D=2 mics is a case too simple? • In some simulations, the N=2 assumption helps • Conjecture: approach is useful when N is a small fraction of L

Conclusion • Contribution: ML approach to noisy BSS problem under generalized sparseness assumptions, addressing degenerate case D<L • Estimation problem can be addressed using sparse decomposition techniques: progress is needed

Thank you!Real speech separation demo for those interested after session!

Outline • Generalized sparseness assumption • Signal model and assumptions • BSS problem definition • Solution sketch: Ad-hoc and ML estimators • Geometrical interpretation of solution • Experimental results • Conclusion

Generalized Sparse Signal Mixing Model and Application to Noisy Blind Source Separation