1 / 29

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification. Man-Wai Mak. Interspeech 2014. Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China. Contents. Motivation of Work Conventional PLDA

Download Presentation

SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification Man-Wai Mak Interspeech 2014 Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China

  2. Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions 2

  3. Motivation I-Vector/PLDA Scoring PLDA Score Enrollment Utterances Conventional i-vector/PLDA systems use a single PLDA model to handle all SNR conditions.

  4. Motivation PLDA Model 1 PLDA Score PLDA Model 2 PLDA Score PLDA Model 3 PLDA Score We argue that a PLDA model should focus on a small range of SNR.

  5. Distribution of SNR in SRE12 Each SNR region is handled by a PLDA Model

  6. Proposed Solution PLDA Model 1 PLDA Score PLDA Model 2 SNR Estimator SNR Posterior Estimator PLDA Model 3 The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the utterance’s SNR.

  7. Key Features of Proposed Solution • Verification scores depend not only on the same-speaker and different-speaker likelihoods but also on the posterior probabilities of SNR.

  8. Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

  9. Probabilistic LDA (PLDA) Residual noise with covariance Σ Speaker factor i-vector extracted from the j-th session of the i-th speaker Global mean of all i-vectors Speaker factor loading matrix • Density of x is • In PLDA, the i-vectors x are modeled by a factor analyzer of the form:

  10. Probabilistic LDA (PLDA) • The PLDA parameters ω={m, V, Σ} are estimated by maximizing

  11. Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

  12. Mixture of PLDA For modeling SNR of utts. For modeling SNR-dependent i-vectors • Model Parameters of mPLDA: 2

  13. Generative Model for mPLDA : SNR in dB where the posterior prob of SNR is Posterior of SNR

  14. PLDA vs mPLDA Generative Model

  15. Likelihood-Ratio Scores of mPLDA • Same-speaker likelihood: SNR of target and test utterances i-vectors of target and test speakers

  16. Likelihood-Ratio Scores of mPLDA • Different-speaker likelihood: Same-speaker likelihood • Verification Score = Different-speaker likelihood 16

  17. PLDA vs mPLDA Auxiliary Function PLDA: Mixture of PLDA: No. of mixtures Latent indicator variables: Latent speaker factors: SNR of training utterances: Session indexes Speaker indexes

  18. PLDA vs mPLDA E-Step

  19. PLDA versus mPLDA M-Step

  20. Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

  21. Experiments Evaluation dataset:Common evaluation condition 2 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives  60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing: Whitening by WCCN then length normalization Followed by LDA (500-dim  200-dim) and WCCN

  22. Experiments • In NIST 2012 SRE, training utterances from telephone channels are clean, but some of the test utterances are noisy. • We used the FaNT tool to add babble noise to the clean training utterances Babble noise Utterances from microphone channels FaNT From telephone channels

  23. Performance on SRE12 • Train on tel+mic speech and test on noisy tel speech (CC4) • Train on tel+mic speech and test on tel speech recorded in noisy environments (CC5) • Use FaNT and a VAD to determine the SNR of test utts. See our ISCSLP14 paper

  24. Performance on SRE12 • Train on tel+mic speech and test on noisy tel speech (CC4) • Use FaNT and a VAD to determine the SNR of test utts. Female Male PLDA PLDA mPLDA mPLDA

  25. Conclusions • Mixture of SNR-dependent PLDA is a flexible model that can handle noisy speech with a wide range of SNR • The contribution of the mixtures are probabilistically combined based on the SNR of the test utterances and the target-speaker’s utterances • Results show that the mixture PLDA performs better than conventional PLDA whenever the SNR of test utterances varies widely.

  26. Hard-Decision Mixture of PLDA

  27. Training of mPLDA • Auxiliary function: No. of mixtures where Latent indicator variables: Latent speaker factors: SNR of training utterances: Session indexes Speaker indexes

  28. PLDA Scoring xs and xt share the same z

  29. Probabilistic LDA (PLDA) z Take a sample according to p(z) Source: S. Prince, “Computer vision: models, learning and inference”, 2012 • PLDA example: 2-D data in 1-D subspace

More Related