1 / 14

Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation. N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA, France Oct. 2009. Content. Under-determined source separation Spatial covariance models Model parameter estimation

gemma-lyons
Download Presentation

Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Spatial Covariance Models For Under-Determined Reverberant Audio Source Separation N. Duong, E. Vincent and R. Gribonval METISS project team, IRISA/INRIA, France Oct. 2009

  2. Content • Under-determined source separation • Spatial covariance models • Model parameter estimation • Experimental evaluation • Conclusion

  3. Under-determined source separation • Use recorded mixture signals to separate sources , where • Convolutive mixing model: Denote the vector of mixing filters from source to microphone array, the contribution of to all microphones and the vector of mixture signals are computed as:

  4. BSS approaches Short-term Fourier transform Sparsity assumption: only FEW sources are active at each time-frequency point Binary masking: only ONE source is active at each time-frequency point L1-norm minimization:

  5. Beamforming model is denoted as and approximated by the distance between each source to microphones [T. Gustafsson et.al.], i.e. in stereo mixture: Covariance matrix of source images Spatial covariance matrix (rank 1) modeling the mixing process Source variance

  6. Spatial covariance models • Purpose of the paper: explore the extension of Gaussian framework, i.e. and , that better account for reverberation • We evaluate potential separation performance by estimating the spatial model parameter from training data • Source separation by Wiener filtering • Models for spatial covariance matrix: • Rank-1 convolutive model • Rank-1 anechoic model • Full-rank direct+diffuse model • Full-rank unconstrained model.

  7. Rank-1 models • Rank-1 anechoic model • Where is steering vector specified in the beamforming approach • Rank-1 convolutive model • Where is the Fourier transform of the mixing filters

  8. Full-rank direct+diffuse model • Assuming that the direct part and the reverberant part are uncorrelated and the reverberant part is diffuse • where and can be specified from statistical room acoustic, i.e. depends on the microphone distance , wall area , and wall reflection coefficient • - In the rectangular room:

  9. Full-rank unconstrained model - A more general model than the previous models where the coefficients of are not related a priori - Allows more flexible modeling of the mixing process since the reverberation part is rarely diffuse and is correlated with the direct part in practice - Expected to improve separation performance of real-world convolutive mixtures.

  10. Model parameter estimation • We investigate the potential separation performance achievable via each model in: • Semi-blind context:Spatial covariance matrices are estimated from true source images but source variances are blindly estimated from the mixture in the ML sense • Where is the Kullback-Leibler (KL) divergence between the empirical covariance matrices and the model-based matrices. • Oracle context:Both and are estimated from the true source images.

  11. Experiment • Purpose: - Compare the source separation performance of the model-based algorithms - Criteria: SDR, SIR, SAR Room dimensions: 4.45 x 3.35 x 2.5 m Source and microphone height: 1.4 m Microphone distance: d = 20 cm or 5 cm Source-to-microphone distance: 120 cm or 50 cm s2 s1 r m1 m2 1.8m • Experimental setup: - Speech length: 5 seconds - Sampling rate: 16 kHz - Sine window for STFT with length of 1024 taps 1.5m s3

  12. Experimental result

  13. Conclusion • - Proposed to model the convolutive mixing process by full-rankspatial covariance matrices • - Experimental results confirm that full-rank spatial covariance matrices better account for reverberation and potentially improve separation performance compared to rank-1 matrices. • Work in progress • - Validated the power of the proposed algorithms over real-world recordings with small source movement (demo session) • - Blind context: learning the model parameters from the recorded mixture (submitted to ICASSP 2010 ). • Future work: • - Consider separation of diffuse and semi-diffuse sources

  14. Thanks for your attention!See you again in the demo session tonight & Your comments…?

More Related