600 likes | 944 Views
Blind Separation of Speech Mixtures. Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University. Introduction. Blind Source Separation. Convolutive. Mixing process:. s 1. s 2. Unmixing process:. Introduction.
E N D
Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University
Introduction Blind Source Separation Convolutive • Mixing process: s1 s2 • Unmixing process:
Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation
Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation Difficult to separate Easy to separate • In frequency domain:
Introduction No. of sources < No. of sensor Overdetermined mixing Easy to separate No. of sources = No. of sensor Determined mixing No. of sources > No. of sensor Difficult to separate Underdetermined mixing
Approaches for BSS of Speech Signals Types of mixing Instantaneous mixing Convolutive mixing
Approaches for BSS of Speech Signals Instantaneous mixing Step 1: Selection of cost function Step 2: Minimization or maximization of the cost function X1 S1 Y1 H W S2 Y2 X2 Separated?
Approaches for BSS of Speech Signals Instantaneous mixing Selection of cost function Statistical independence Signals from two different sources are independent Information theoretic Non-Gaussianity Central limit theorem: Mixture of two or more sources will be more Gaussian than their individual components Non Gaussianity measures: Kurtosis Negentropy Nonlinear cross moments Temporal structure of speech Non-stationarity of speech
Approaches for BSS of Speech Signals Instantaneous mixing Minimization or maximization of the cost function simple gradient method Natural gradient method e.g. Informax ICA algorithm Newton’s method e.g. FastICA
Approaches for BSS of Speech Signals Convolutive Mixing Time Domain: Frequency Domain: • Advantage: • No permutation problem • Disadvantage: • Slow convergence • High computational cost for long filter taps • Advantage: • Low computational cost • Fast convergence • Disadvantage: • Permutation Problem X1 S1 Y1 Y2 H W or S2 Y2 Y1 X2
Permutation Problem in Frequency Domain BSS Corresponding to y3 One frequency bin Instantaneous ICA algorithm f1 BSS K point FFT K point IFFT Solving permutation Problem y1 y1 x1 f2 BSS y2 y2 x2 x3 y3 y3 fk BSS Mixed signals Still signals are mixed Separated signals Corresponding to different sources Due to permutation problem
Motivation Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain
My Contribution - I Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain
Algorithm for Solving the Permutation Problem One frequency bin Instantaneous ICA algorithm f1 BSS K point FFT Solving permutation Problem K point IFFT y1 x1 f2 BSS y2 x2 x3 y3 fk BSS Mixed signals Separated signals Permutation problem solved Permutation problem
Existing Method forSolving the Permutation Problem Direction Of Arrival (DOA) method: Direction of y1 = -30o Direction of y2 = 20o Position of the pth sensor Velocity of sound
Existing Method forSolving the Permutation Problem Direction Of Arrival (DOA) method: • Disadvantages: • Fails at lower frequencies. • Fails when sources are near. • Room reverberation. • Sensor positions must be known. • Reasons for failure at lower freq: • Lower spacing causes error in phase difference measurement. • The relation is approximated for plane wave front under anechoic condition
Existing Method forSolving the Permutation Problem Adjacent bands correlation method: High correlation Low correlation Low correlation f1 BSS K point FFT Solving permutation Problem K point IFFT y1 x1 f2 BSS y2 x2 y3 x3 fk BSS Mixed signals Separated signals
Existing Method forSolving the Permutation Problem Adjacent bands correlation method: r11 r11 r11 r11 s1 …….. K-1 K K+1 K+2 K+3 …….. Correlation matrix r12 r21 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s2 …….. K-1 K K+1 K+2 K+3 …….. r22 r22 r22 r22 Example Example With confidence Without confidence Change permutation No change
Existing Method forSolving the Permutation Problem Adjacent bands correlation method: r11 r11 r11 r11 Correlation matrix s1 …….. K-1 K K+1 K+2 K+3 …….. r11 r12 r21 r22 r12 r21 r12 r21 r12 r21 r12 r21 s2 …….. K-1 K K+1 K+2 K+3 …….. r22 r22 r22 r22 Disadvantage: The method is not robust
Existing Method forSolving the Permutation Problem Combination of DOA and Correlation methods method: DOA + Harmonic Correlation + Adjacent bands correlation Advantage: Increased robustness
Proposed algorithm: Partial separation method(Parallel configuration)Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals,” Neurocomputing, Vol. 71, NO. 10–12, June 2008, pp. 2098–2112. Time domain stage Frequency domain stage
Partial separation method(Parallel configuration) Time domain stage Frequency domain stage
Partial separation method(Cascade configuration) Parallel configuration Frequency domain stage Time domain stage
Advantages of Partial Separation method • Robustness
Comparison with DOA method PS - Partial Separation method with confidence check, C1 - Correlation between the adjacent bins without confidence check, C2 - Correlation between adjacent bins with confidence check, Ha - Correlation between the harmonic components with confidence check, PS1 - Partial separation method alone without confidence check.
My Contribution -II Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain
Underdetermined Blind Source Separation of Instantaneous Mixtures
Mathematical Representation of Instantaneous MixingReference: V. G. Reju, S. N. Koh and I. Y. Soon, “An algorithm for mixing matrix estimation in instantaneous blind source separation,” Signal Processing, Vol. 89, Issue 9, September 2009, pp. 1762–1773. Time domain: P – No. of mixtures Q – No. of sources Time-Frequency domain:
Single Source Points in Time-Frequency domain Single source point 1 Single source point 2 0 0
Single Source Points in Time-Frequency domain Single source point 1 Single source point 2
Single Source Points in Time-Frequency domain Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar .·. At single source point 1: .·. At single source point 2:
Scatter Diagram of the Mixtures When Source are Perfectly Sparse Example: 0 0 0 0 0
Scatter Diagram of the Mixtures When Source are Not Perfectly Sparse Example: 0 0 0 0 0 0
Scatter Diagram of the Mixtures when Sources are Sparse No. of sources = 6 No. of mixtures = 2
Scatter Diagram of the Mixtures when Sources are Sparse, After Clustering No. of sources = 6 No. of mixtures = 2
Scatter Diagram of the Mixtures when Sources are NotPerfectly Sparse Objective: Estimation of the single source points. No. of sources = 6 No. of mixtures = 2
Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar Multi source point
Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar Multi source point
Principle of the Proposed Algorithm for the Detection of Single Source Points Average of 15 pairs of speech utterances of length 10 s each SSP MSP
Proposed Algorithm for the Detection of Single Source Points SSP MSP
Elimination of Outliers SSPs detection Clustering Outlier elimination
Experimental Results No. of mixtures =2, No. of sources =6
Detected Single Source Points,Three mixtures No. of mixtures =3, No. of sources =6
Comparison with Classical Algorithms for Determined Case Average of 500 experimental results No. of mixtures =2 No. of sources =2 ->
Comparison with Method Proposed in [1], Underdetermined case Normalized mean square error (NMSE) in mixing matrix estimation (dB) P – No. of mixtures Q – No. of sources Order of the mixing matrices (PxQ) [1] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on Signal Processing, vol. 54, p. 423–437, Feb. 2006.
Advantages of the Proposed algorithm 1) Much simpler constrain: the algorithm does not require “single source zone”. 2) Separation performance is better. 3) The algorithm is extremely simple but effective Step 1: Convert x in the time domain to the TF domain to get X. Step 2: Check the condition Step 3: If the condition is satisfied, then X(k, t)is a sample at the SSP, and this sample is kept for mixing matrix estimation; otherwise, discard the point. Step 4: Repeat Steps 2 to 3 for all the points in the TF plane or until sufficient number of SSPs are obtained. ->
My Contributions – III, IV and V Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain
Underdetermined Convolutive Blind Source Separation via Time-Frequency MaskingReference: V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined Convolutive Blind Source Separation via Time- Frequency Masking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, NO. 1, Jan. 2010, pp. 101–116. STFT Apply mask Mic 1 Mixture in TF domain STFT Apply Mask Mic P Mask estimation Separated signals in TF domain
Mathematical Representation Time domain: P – No. of mixtures Q – No. of sources Frequency domain: