290 likes | 674 Views
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification. Man-Wai Mak. Interspeech 2014. Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China. Contents. Motivation of Work Conventional PLDA
E N D
SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification Man-Wai Mak Interspeech 2014 Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong SAR, China
Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions 2
Motivation I-Vector/PLDA Scoring PLDA Score Enrollment Utterances Conventional i-vector/PLDA systems use a single PLDA model to handle all SNR conditions.
Motivation PLDA Model 1 PLDA Score PLDA Model 2 PLDA Score PLDA Model 3 PLDA Score We argue that a PLDA model should focus on a small range of SNR.
Distribution of SNR in SRE12 Each SNR region is handled by a PLDA Model
Proposed Solution PLDA Model 1 PLDA Score PLDA Model 2 SNR Estimator SNR Posterior Estimator PLDA Model 3 The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the utterance’s SNR.
Key Features of Proposed Solution • Verification scores depend not only on the same-speaker and different-speaker likelihoods but also on the posterior probabilities of SNR.
Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions
Probabilistic LDA (PLDA) Residual noise with covariance Σ Speaker factor i-vector extracted from the j-th session of the i-th speaker Global mean of all i-vectors Speaker factor loading matrix • Density of x is • In PLDA, the i-vectors x are modeled by a factor analyzer of the form:
Probabilistic LDA (PLDA) • The PLDA parameters ω={m, V, Σ} are estimated by maximizing
Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions
Mixture of PLDA For modeling SNR of utts. For modeling SNR-dependent i-vectors • Model Parameters of mPLDA: 2
Generative Model for mPLDA : SNR in dB where the posterior prob of SNR is Posterior of SNR
PLDA vs mPLDA Generative Model
Likelihood-Ratio Scores of mPLDA • Same-speaker likelihood: SNR of target and test utterances i-vectors of target and test speakers
Likelihood-Ratio Scores of mPLDA • Different-speaker likelihood: Same-speaker likelihood • Verification Score = Different-speaker likelihood 16
PLDA vs mPLDA Auxiliary Function PLDA: Mixture of PLDA: No. of mixtures Latent indicator variables: Latent speaker factors: SNR of training utterances: Session indexes Speaker indexes
PLDA vs mPLDA E-Step
PLDA versus mPLDA M-Step
Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions
Experiments Evaluation dataset:Common evaluation condition 2 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives 60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing: Whitening by WCCN then length normalization Followed by LDA (500-dim 200-dim) and WCCN
Experiments • In NIST 2012 SRE, training utterances from telephone channels are clean, but some of the test utterances are noisy. • We used the FaNT tool to add babble noise to the clean training utterances Babble noise Utterances from microphone channels FaNT From telephone channels
Performance on SRE12 • Train on tel+mic speech and test on noisy tel speech (CC4) • Train on tel+mic speech and test on tel speech recorded in noisy environments (CC5) • Use FaNT and a VAD to determine the SNR of test utts. See our ISCSLP14 paper
Performance on SRE12 • Train on tel+mic speech and test on noisy tel speech (CC4) • Use FaNT and a VAD to determine the SNR of test utts. Female Male PLDA PLDA mPLDA mPLDA
Conclusions • Mixture of SNR-dependent PLDA is a flexible model that can handle noisy speech with a wide range of SNR • The contribution of the mixtures are probabilistically combined based on the SNR of the test utterances and the target-speaker’s utterances • Results show that the mixture PLDA performs better than conventional PLDA whenever the SNR of test utterances varies widely.
Training of mPLDA • Auxiliary function: No. of mixtures where Latent indicator variables: Latent speaker factors: SNR of training utterances: Session indexes Speaker indexes
PLDA Scoring xs and xt share the same z
Probabilistic LDA (PLDA) z Take a sample according to p(z) Source: S. Prince, “Computer vision: models, learning and inference”, 2012 • PLDA example: 2-D data in 1-D subspace