1 / 14

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Explore a novel approach for single-channel signal separation, leveraging known sounds to extract individual sources. Using a latent variable model and sparse approximation, achieve high-quality source separation without the need for separate training steps. Learn how to apply sparsity to minimize entropy and ensure plausible source estimates while using training data directly as a dictionary.

lbergan
Download Presentation

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, MadhusudanaShashanka, Bhiksha Raj NIPS 2009

  2. Introduction • Problem : Single channel signal separation • Separating out signals from individual sources in a mixed recording • General approach • Derive a generalizable model that captures the salient features of each source • Separation is achieved by abstracting components from the mixed signal that conform to the characterization of the individual sources

  3. Physical Intuition Recover sources by reweighting of frequency subbands from a single recording

  4. Latent Variable Model • Given magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins • At a given time frame t, Pt(f) represents the probabilty of drawing frequency f • The model assumes that Pt(f) is comprised of bases indexed by a latent variable z

  5. Latent Variable Model (Contd.) • Now let the matrix VF×T of entries vft represent the magnitude spectrogram of the mixture sound and vt represent time frame t (the t-th column vector of matrix V) • First we assume that we have an already trained model in the form of basis vector Ps (f/z) • These bases represent a dictionary of spectra that best describe each source

  6. Source separation • Decompose a new mixture of these known sources in terms of the contributions of the dictionaries of each source • Use EM algorithm to estimate Pt (z/s) and Pt(s) • The reconstruction of the contribution of source s in the mixture is given by

  7. Contribution of this paper • Use training data directly as a dictionary • Authors argue that given any sufficiently large collection of data from a source the best possible characterization of any data is quite simply the data themselves (e.g., non-parametric density learning using Parzen-window) • Side-step the need for separate model training step • Large dictionary provides a better description of the sources, as opposed to the less expressive learned basis models • Source estimates are guaranteed to lie on the source manifold as opposed to trained approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates

  8. Using Training data as Dictionary • Use each frame of the spectrograms of the training sequences as the bases Ps(f/z) • Let be the training spectrogram from source s. In this case, the latent variable z for source s takes T(s) values, and the z-th basis function will be given by the z-th column vector of W(s) • With the above model ideally one would want to use one dictionary element per source at any point of time • Ensure output lie on the source manifold • Similar to a nearest neighbor model (search is computationally very expensive) • In this paper authors propose using sparsity

  9. Entropic prior • Given a probability distribution θ the entropic prior is defined as • α is a weighting factor and determines the level of sparsity • A sparse representation has a low entropy (since only few elements are ‘active”) • Imposing this prior during MAP estimation is a way to minimize entropy during estimation which will result in sparse θ representation

  10. Sparse approximation • We would like to minimize the entropies of both the speaker dependent mixture weights and the source priors at every frame • However, • Thus reducing the entropy of the joint distribution is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors

  11. Sparse approximation • The model written in terms of this parameter is given by, • To impose sparsity we apply the entropic prior given by, • Apply EM to estimate • Reconstructed source is given by,

  12. Results on real data

  13. Results on real data

  14. Comments • The use of sparsity ensures that the output is a plausible speech signal devoid of artifacts like distortion and musical noise • Unfortunate side effect is the need to use a very large dictionary • However significant reduction in dictionary size may be achieved by using an energy threshold to select the loudest frames of he training spectrogram as bases • Outperforms trained basis models of same size

More Related