110 likes | 124 Views
Learn about probabilistic latent variable decomposition techniques for signal extraction from mixed audio recordings. Explore methods like Blind Source Separation and Informed Source Separation for improved signal separation and reconstruction.
E N D
Single-Channel Audio Source Separation based on Probabilistic Latent Variable Decomposition Ph.D Student Jeongsoo Park
Introduction Spectral basis Weight Desired signal Reconstruction Spectrogram Residuals Background signal • Signal separation from mixed single-channel recordings • Time-frequency components of the desired signal are selected and reconstructed from the mixed signal • Blind Source Separation (BSS): No information of desired source is available • Informed Source Separation (ISS): Spectral and temporal information is available
Conventional approaches on BSS (1/5) • Non-negative Matrix Factorization (NMF) • A group of algorithms in multivariate analysis and linear algebra where a matrix is factorized into (usually) two matrices • NMF minimizes the error between V and WH while restricting W and H to be entry-wise non-negative • Two commonly used cost functions (Lee & Seung, 2001)
Conventional approaches on BSS (2/5) • Independent Component Analysis (ICA) • Separating a multivariate signal into additive subcomponents supposing the mutual statistical independence of the non-Gaussian source signals • When the independence assumption is correct, blind ICA separation of a mixed signal gives very good results • Definitions of independence for ICA • Minimization of Mutual Information(MMI) • Maximization of non-Gaussianity
Conventional approaches on BSS (3/5) Source separation from sound mixtures Training I don't know who you are. I don't know what you want. If you are looking for ransom, I can tell you I don't have money. But what I do have are a very particular set of skills; skills I have acquired over a very long career. Skills that make me a nightmare for people like you. If you let my daughter go now, that'll be the end of it. I will not look for you, I will not pursue you. But if you don't, I will look for you, I will find you, and I will kill you. I love my daughter. Result I love my daughter. I hate North Korea. • Probabilistic Latent Component Analysis (PLCA) • If we have sufficient information of a source (speaker), we can extract the signal of the source from sound mixtures
Conventional approaches on BSS (4/5) Frequency distribution How they appear in time weight Probability mass function z1 z2 • Probabilistic Latent Component Analysis (PLCA) • Interpretation of time-frequency representation of audio signal (spectrogram) as 2D histogram (outcomes of a discrete random process) • Avectoris interpretedasweightedsumoflatent variables’ distribution
Conventional approaches on BSS (5/5) • Performance evaluation
Conventional approaches on ISS (1/2) User-guided signal learning Mixed signal separation • Separation by Humming (Smaragdis et al., 2009) • User-guided signal is given to inform desired signal
Conventional approaches on ISS (2/2) • Performance evaluation
Ideas (1/2) Frequency Sparse 5513Hz Dense 0 Hz • Goal • Quickening EM algorithm • Approach • 1. Sparsity of high frequency components • 2. Application of Zwicker’s model Time
Ideas (2/2) tap tap tap tap tap tap tap tap Formant extraction Rearrange • Goal • Tapping based ISS • Approach • Extracting formants based on tapping information • Using formant and temporal information, we might be able to extract desired source