160 likes | 332 Views
BGU. Multiple Pitch Tracking for Blind Source Separation Using a Single Microphone. Joseph Tabrikian Dept. of Electrical and Computer Engineering Ben-Gurion University of the Negev Workshop on: Speech Enhancement and Multichannel Audio Processing Technion 22.2.2007. BGU. Outline.
E N D
BGU Multiple Pitch Tracking for Blind Source Separation Using a Single Microphone Joseph Tabrikian Dept. of Electrical and Computer Engineering Ben-Gurion University of the Negev Workshop on: Speech Enhancement and Multichannel Audio Processing Technion 22.2.2007
BGU Outline • Motivation • Single source pitch estimation and tracking • Multiple source pitch estimation and tracking • Experiments • Conclusion
BGU Motivation • Speech enhancement • Sensitivity of many audio processing algorithms to interference. For example: • Automatic speech/speaker recognition • Speech/music compression • Single microphone blind source separation (BSS) • Karaoke
BGU Single Source - Modeling • Voice frames - harmonic model: additive Gaussian noise • In matrix notation:
BGU Single Source – Pitch Tracking • Maximum Likelihood (ML) estimator: • Pitch tracking: The data vector at the mth frame: - first-order Markov process: • Maximum A-posteriori Probability (MAP) pitch tracking via the Viterbi algorithm. (Tabrikian-Dubnov-Dickalov 2004)
BGU Single Source - Voicing Decision • Unvoiced model • Colored Gaussian noise model: • Voiced/unvoiced decision by the Generalized Likelihood Ratio Test (GLRT): (Fisher-Tabrikian-Dubnov 2006)
BGU Multiple Sources • ML estimator of from under the model: with unknown signal and unknown (Gaussian) noise covariance: (Harmanci-Tabrikian-Krolik 2000)
BGU Multiple Sources • Voiced model: v includes other interferences. is unknown. • Using J overlapping subframes of size Ls (2K+1<J< Ls): jth column of :
BGU Multiple Sources • Pitch tracking: The data vector at the mth frame: - first-order Markov process Maximum A-posteriori Probability (MAP) pitch tracking via the Viterbi algorithm
BGU Multiple Sources - Voicing Decision • Unvoiced model Colored Gaussian noise model: • Voiced/unvoiced decision by the GLRT: (Fisher-Tabrikian-Dubnov 2007)
BGU Multiple Source Models • Exact ML for the strongest voiced signal, and “locally ML” for other voiced signals Likelihood function
BGU Experiments – Single Source
BGU Experiments - Two Sources
BGU Experiments – Voicing Decision
BGU Experiments - – Voicing Decision
BGU Conclusions • ML pitch estimation for single and multiple sources have been developed under the harmonic model for voiced frames. • The derived likelihood functions under the two models allow implementation of the Viterbi algorithm for MAP pitch tracking. • The GLRT for voicing decision is derived under the two models. • Future work: • development of multiple hypothesis tracking methods for single microphone BSS. • Adaptive estimation of the number of harmonics