Blind Separation of Speech Mixtures

Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University

Introduction Blind Source Separation Convolutive • Mixing process: s1 s2 • Unmixing process:

Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation

Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation Difficult to separate Easy to separate • In frequency domain:

Introduction No. of sources < No. of sensor Overdetermined mixing Easy to separate No. of sources = No. of sensor Determined mixing No. of sources > No. of sensor Difficult to separate Underdetermined mixing

Approaches for BSS of Speech Signals Types of mixing Instantaneous mixing Convolutive mixing

Approaches for BSS of Speech Signals Instantaneous mixing Step 1: Selection of cost function Step 2: Minimization or maximization of the cost function X1 S1 Y1 H W S2 Y2 X2 Separated?

Approaches for BSS of Speech Signals Instantaneous mixing Selection of cost function Statistical independence Signals from two different sources are independent Information theoretic Non-Gaussianity Central limit theorem: Mixture of two or more sources will be more Gaussian than their individual components Non Gaussianity measures: Kurtosis Negentropy Nonlinear cross moments Temporal structure of speech Non-stationarity of speech

Approaches for BSS of Speech Signals Instantaneous mixing Minimization or maximization of the cost function simple gradient method Natural gradient method e.g. Informax ICA algorithm Newton’s method e.g. FastICA

Approaches for BSS of Speech Signals Convolutive Mixing Time Domain: Frequency Domain: • Advantage: • No permutation problem • Disadvantage: • Slow convergence • High computational cost for long filter taps • Advantage: • Low computational cost • Fast convergence • Disadvantage: • Permutation Problem X1 S1 Y1 Y2 H W or S2 Y2 Y1 X2

Permutation Problem in Frequency Domain BSS Corresponding to y3 One frequency bin Instantaneous ICA algorithm f1 BSS K point FFT K point IFFT Solving permutation Problem y1 y1 x1 f2 BSS y2 y2 x2 x3 y3 y3 fk BSS Mixed signals Still signals are mixed Separated signals Corresponding to different sources Due to permutation problem

Motivation Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

My Contribution - I Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

Algorithm for Solving the Permutation Problem One frequency bin Instantaneous ICA algorithm f1 BSS K point FFT Solving permutation Problem K point IFFT y1 x1 f2 BSS y2 x2 x3 y3 fk BSS Mixed signals Separated signals Permutation problem solved Permutation problem

Existing Method forSolving the Permutation Problem Direction Of Arrival (DOA) method: Direction of y1 = -30o Direction of y2 = 20o Position of the pth sensor Velocity of sound

Existing Method forSolving the Permutation Problem Direction Of Arrival (DOA) method: • Disadvantages: • Fails at lower frequencies. • Fails when sources are near. • Room reverberation. • Sensor positions must be known. • Reasons for failure at lower freq: • Lower spacing causes error in phase difference measurement. • The relation is approximated for plane wave front under anechoic condition

Existing Method forSolving the Permutation Problem Adjacent bands correlation method: High correlation Low correlation Low correlation f1 BSS K point FFT Solving permutation Problem K point IFFT y1 x1 f2 BSS y2 x2 y3 x3 fk BSS Mixed signals Separated signals

Existing Method forSolving the Permutation Problem Adjacent bands correlation method: r11 r11 r11 r11 s1 …….. K-1 K K+1 K+2 K+3 …….. Correlation matrix r12 r21 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s2 …….. K-1 K K+1 K+2 K+3 …….. r22 r22 r22 r22 Example Example With confidence Without confidence Change permutation No change

Existing Method forSolving the Permutation Problem Adjacent bands correlation method: r11 r11 r11 r11 Correlation matrix s1 …….. K-1 K K+1 K+2 K+3 …….. r11 r12 r21 r22 r12 r21 r12 r21 r12 r21 r12 r21 s2 …….. K-1 K K+1 K+2 K+3 …….. r22 r22 r22 r22 Disadvantage: The method is not robust

Existing Method forSolving the Permutation Problem Combination of DOA and Correlation methods method: DOA + Harmonic Correlation + Adjacent bands correlation Advantage: Increased robustness

Proposed algorithm: Partial separation method(Parallel configuration)Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals,” Neurocomputing, Vol. 71, NO. 10–12, June 2008, pp. 2098–2112. Time domain stage Frequency domain stage

Partial separation method(Parallel configuration) Time domain stage Frequency domain stage

Partial separation method(Cascade configuration) Parallel configuration Frequency domain stage Time domain stage

Advantages of Partial Separation method • Robustness

Comparison with Adjacent Bands Correlation Method

Comparison with DOA method PS - Partial Separation method with confidence check, C1 - Correlation between the adjacent bins without confidence check, C2 - Correlation between adjacent bins with confidence check, Ha - Correlation between the harmonic components with confidence check, PS1 - Partial separation method alone without confidence check.

My Contribution -II Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

Underdetermined Blind Source Separation of Instantaneous Mixtures

Mathematical Representation of Instantaneous MixingReference: V. G. Reju, S. N. Koh and I. Y. Soon, “An algorithm for mixing matrix estimation in instantaneous blind source separation,” Signal Processing, Vol. 89, Issue 9, September 2009, pp. 1762–1773. Time domain: P – No. of mixtures Q – No. of sources Time-Frequency domain:

Single Source Points in Time-Frequency domain Single source point 1 Single source point 2 0 0

Single Source Points in Time-Frequency domain Single source point 1 Single source point 2

Single Source Points in Time-Frequency domain Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar .·. At single source point 1: .·. At single source point 2:

Scatter Diagram of the Mixtures When Source are Perfectly Sparse Example: 0 0 0 0 0

Scatter Diagram of the Mixtures When Source are Not Perfectly Sparse Example: 0 0 0 0 0 0

Scatter Diagram of the Mixtures when Sources are Sparse No. of sources = 6 No. of mixtures = 2

Scatter Diagram of the Mixtures when Sources are Sparse, After Clustering No. of sources = 6 No. of mixtures = 2

Scatter Diagram of the Mixtures when Sources are NotPerfectly Sparse Objective: Estimation of the single source points. No. of sources = 6 No. of mixtures = 2

Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar Multi source point

Principle of the Proposed Algorithm for the Detection of Single Source Points Average of 15 pairs of speech utterances of length 10 s each SSP MSP

Proposed Algorithm for the Detection of Single Source Points SSP MSP

Elimination of Outliers SSPs detection Clustering Outlier elimination

Experimental Results No. of mixtures =2, No. of sources =6

Detected Single Source Points,Three mixtures No. of mixtures =3, No. of sources =6

Comparison with Classical Algorithms for Determined Case Average of 500 experimental results No. of mixtures =2 No. of sources =2 ->

Comparison with Method Proposed in [1], Underdetermined case Normalized mean square error (NMSE) in mixing matrix estimation (dB) P – No. of mixtures Q – No. of sources Order of the mixing matrices (PxQ) [1] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on Signal Processing, vol. 54, p. 423–437, Feb. 2006.

Advantages of the Proposed algorithm 1) Much simpler constrain: the algorithm does not require “single source zone”. 2) Separation performance is better. 3) The algorithm is extremely simple but effective Step 1: Convert x in the time domain to the TF domain to get X. Step 2: Check the condition Step 3: If the condition is satisfied, then X(k, t)is a sample at the SSP, and this sample is kept for mixing matrix estimation; otherwise, discard the point. Step 4: Repeat Steps 2 to 3 for all the points in the TF plane or until sufficient number of SSPs are obtained. ->

My Contributions – III, IV and V Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

Underdetermined Convolutive Blind Source Separation via Time-Frequency MaskingReference: V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined Convolutive Blind Source Separation via Time- Frequency Masking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, NO. 1, Jan. 2010, pp. 101–116. STFT Apply mask Mic 1 Mixture in TF domain STFT Apply Mask Mic P Mask estimation Separated signals in TF domain

Mathematical Representation Time domain: P – No. of mixtures Q – No. of sources Frequency domain:

Blind Separation of Speech Mixtures