Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification

Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification Man-Wai Mak APSIPA 2015 Department of Electronic and Information Engineering The Hong Kong Polytechnic University

Contents • Motivation of Work • Conventional PLDA vs. Mixture of PLDA • Fast Scoring for Mixture of PLDA • Experiments on NIST 2012 SRE • Conclusions 2

Motivation PLDA Model PLDA Score Enrollment i-vectors Conventional i-vector/PLDA systems use a single PLDA model to handle all SNR conditions.

Motivation PLDA Model 1 PLDA Score PLDA Model 2 PLDA Score PLDA Model 3 PLDA Score We argue that a PLDA model should focus on a small range of SNR.

Proposed Solution PLDA Model 1 PLDA Score SNR Estimator PLDA Model 2 SNR Posterior Estimator PLDA Model 3 M.W. Mak, X.M. Pang and J.T. Chien, "Mixture of PLDA for Noise Robust I-Vector Speaker Verification", IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 13-0142, Jan. 2016. The full spectrum of SNRs is handled by a mixture of PLDA in which the posteriors of the indicator variables depend on the utterance’s SNR (Mak, Interspeech14, Mak et al. T-ASLP 16)

Key Features of Proposed Solution • It was found that the performance of mixture of PLDA is much better than the conventional PLDA when the test utterances exhibit a wide range of SNR. • However, the scoring function of this model is significantly more complex than the conventional PLDA. • This paper proposes a method to reduce the scoring time by up to 60%.

Contents • Motivation of Work • Conventional iVector-PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

I-Vectors • A low dimension representation of the entire utterance. • Factor analysis model: Speaker- and channel-dependent latent factor Speaker- and channel-dependent supervector Low-rank total variability matrix UBM supervector • Given T and an utterance of speaker s, the posterior mean of the latent factor xs is the i-vector representing speaker s • Do the same for test speakers. • Totally unsupervised • I-vectors contain both speaker and channel information

Probabilistic LDA (PLDA) Residual noise with covariance Σ Speaker factor i-vector extracted from the j-th session of the i-th speaker Global mean of all i-vectors Low-rank Speaker factor loading matrix • V is trained by using the i-vectors of many speakers, each has multiple sessions. • Speaker labels are used in the training • Aim to suppress channel effect on the verification scores • In PLDA, the i-vectors x are modeled by a factor analyzer of the form:

PLDA Scoring

Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

Mixture of PLDA (mPLDA) For modeling SNR of utts. For modeling SNR-dependent i-vectors • Generative Model: I-vector SNR (dB) • Model Parameters: 2

Graphical Model of mPLDA SNR of the j-th utterance from the i-th speaker For modeling SNR of utts. For modeling SNR-dependent i-vectors 2

Likelihood-Ratio Scores of mPLDA • Different-speaker likelihood: Same-speaker likelihood • Verification Score = Different-speaker likelihood #For full derivation, see http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf 14

Complexity Analysis Dimension of i-vectors 15

Sparseness Analysis of SNR Posteriors • Key idea: If the posterior probabilities of SNR are sparse, we may drop the combinations of that lead to small posterior 16

Sparseness Analysis of SNR Posteriors Combination of target-speaker utterances and test utterances pairs, sorted by SNR posterior prob. 17

Fast mPLDA Scoring 18

Fast mPLDA Scoring 19

PLDA vs. Fast mPLDA Scoring • PLDA: • Complexity: • Fast mPLDA: • Complexity: 20

Contents • Motivation of Work • Conventional PLDA • Mixture of PLDA for Noise Robust Speaker Verification • Experiments on SRE12 • Conclusions

Experiments Evaluation dataset:Common evaluation conditions 3 and 4 of NIST SRE 2012 core set. Parameterization: 19 MFCCs together with energy plus their 1st and 2nd derivatives  60-Dim UBM: gender-dependent, 1024 mixtures Total Variability Matrix: gender-dependent, 500 total factors I-Vector Preprocessing: Whitening by WCCN then length normalization Followed by LDA (500-dim  200-dim) and WCCN PLDA and mPLDA with 150 speaker factors

Evaluation Conditions CC3 CC4

Comparing Scoring Time Common Condition 3 EER Scoring Time (sec.) EER (%) Scoring Time K = 4 K = 3 K = 2

Comparing Scoring Time Common Condition 4 EER Scoring Time (sec.) EER (%) Scoring Time K = 4 K = 3 K = 2

Conclusions • Mixture of SNR-dependent PLDA (mPLDA) is a flexible model that can handle noisy speech with a wide range of SNR • This paper reduces the scoring time of mPLDA by half with minor degradation in performance. • This is achieved by omitting the computation of likelihood terms whose corresponding SNR posterior probabilities are small. • Further information: • http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf

Performance on SRE12

Distribution of SNR in SRE12 Each SNR region is handled by a PLDA Model

Graphical Model of PLDA

Likelihood-Ratio Scores of mPLDA • Same-speaker likelihood: SNR of target and test utterances i-vectors of target and test speakers

Training Data • In NIST 2012 SRE, training utterances from telephone channels are clean, but some of the test utterances are noisy. • We used the FaNT tool to add babble noise to the clean training utterances Babble noise Utterances from microphone channels FaNT From telephone channels

Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification

Fast Scoring for Mixture of PLDA in I-Vector/PLDA Speaker Verification

Presentation Transcript

Paperless Customs

A Tutorial on Text-Independent Speaker Verification

Modified from Stanford CS276 slides Chap. 6: Scoring, Term Weighting and the Vector Space Model

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space

Information Retrieval

SCORING, TERM WEIGHTING AND THE VECTOR SPACE MODEL

Speaker Independent Arabic Speech Recognition Using Support Vector Machine

VERIFICATION

A NONLINEAR MIXTURE AUTOREGRESSIVE MODEL FOR SPEAKER VERIFICATION

Holistic Scoring

Fast vector quantization image coding by mean value predictive algorithm

VQ speaker verification with sentence codebook

Fingerprint verification

SPEAKER VERIFICATION USING SUPPORT VECTOR MACHINES

CS276 Information Retrieval and Web Search

Segmental Score Fusion for Text-independent Speaker Verification

Audio-visual speaker verification using continuous fused HMMs

CSE 538 MRS BOOK – CHAPTER VI Scoring, Term Weighting and the Vector Space Model