3. Applications to Speaker Verification

3. Applications to Speaker Verification

Outline of the presentation 3. Applications to Speaker Verifications 3.1 Feature extraction 3.2 Speaker models 3.3 Scoring normalization 3.4 Video demo

Pre-processing and Feature Extraction A/D Converter cepstrum and delta cepstrum coefficients LPC Analysis Hamming Windowing

LP Spectrum FFT-based signal Spectrum Spectrum derived from LP-Cepstrum Amplitude (dB) Cepstral Processing Spectrum Hz Pre-processing and Feature Extraction • Spectral Envelop Reconstructed from different feature parameters

Enrollment RBF Network EBF Network Feature vectors Feature vectors K-means K-means Covariance analysis or EM K-nearest neighbor Function centers Linear regression Linear regression Covariance matrices Function widths Output weights W

Enrollment 0(Bias) Output weights  Background speakers´centers Speaker centers x1 x2 xD Input (Feature vectors)

Verification ^ y(x) + + - Averaging Averaging Softmax   x1 x2 xD

Verification • Distributions of the average network outputs RBF EBF

Verification • Error rates against decision threshold

Verification Results (TIMIT) Number of centers per network

Verification Results • Decision Boundaries EBF (diagonal cov. Matrices) EBF (full cov. Matrices)

Conclusion • EBF networks with full covariance matrices trained with the EM algorithm outperform the ones whose basis function parameters are estimated by the k-means algorithm and sample covariance. • RBF networks are found to be the poorest performer in terms of verification accuracy.

Conclusion • EBF networks with full covariance matrices achieve the lowest error rates when networks with the same number of free parameters are compared.

4. Bonus Materials: Scoring Normalization for Speaker Verification

Purpose of Scoring Normalization Speaker model of claimed ID Sc Speech with claimed speaker ID X Feature extraction - Imposter Models Normalization Term

Purpose of Scoring Normalization > Threshold Accept the claimant If log L(X)  Threshold Reject the claimant Prob. x1 (Accept) x2 (Reject) x

EBFN-based normalization Speaker centers Anti-speaker centers Speaker models: Elliptical basis function networks (EBFN)

References: [1] Mak, M.W. and Kung, S.Y. (2000). "Estimation of elliptical basis function parameters by the EM algorithms with application to speaker verification," IEEE Trans. on Neural Networks, Vol. 11, No. 4, pp. 961-969. [2] Yiu, K.K., Mak, M.W. and Li, C.K. (1999), “Gaussian mixture models and probabilistic decision-based neural networks for pattern classification: A comparative study," Neural Computing and Applications, 8, 235-245. [3] Zhang, W.D. Mak, M.W. and He, M.X. (2000). "A two-stage scoring method combining world and cohort models for speaker verification," Proc. ICASSP, Vol. 2, pp. 1193-1196, 2000. [4] Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8 (1), pp. 114-132. [5] Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons and Radial Basis Functions Networks,” Neurocomputing, 6 (1), 99-118.

3. Applications to Speaker Verification

3. Applications to Speaker Verification

Presentation Transcript

Speaker Verification

Speaker Identification and Verification

Speaker Verification

Speaker Verification System Part B Final Presentation

Speaker Verification

EPSE Module 3 Guest Speaker

A NONLINEAR MIXTURE AUTOREGRESSIVE MODEL FOR SPEAKER VERIFICATION

A NONLINEAR MIXTURE AUTOREGRESSIVE MODEL FOR SPEAKER VERIFICATION

VQ speaker verification with sentence codebook

Speaker Verification via Kernel Methods

Speaker Verification: Is it Industrial Strength?

Speaker Verification: From Research to Reality

SPEAKER VERIFICATION USING SUPPORT VECTOR MACHINES

Speaker Verification

Speaker Verification System Part A Final Presentation

Segmental Score Fusion for Text-independent Speaker Verification

Audio-visual speaker verification using continuous fused HMMs

Speaker Verification System using SVM

Introduction to Speaker

Speaker Identification and Verification