180 likes | 186 Views
3. Applications to Speaker Verification. Outline of the presentation. 3. Applications to Speaker Verifications 3.1 Feature extraction 3.2 Speaker models 3.3 Scoring normalization 3.4 Video demo. Pre-processing and Feature Extraction. A/D Converter. cepstrum and delta cepstrum
E N D
Outline of the presentation 3. Applications to Speaker Verifications 3.1 Feature extraction 3.2 Speaker models 3.3 Scoring normalization 3.4 Video demo
Pre-processing and Feature Extraction A/D Converter cepstrum and delta cepstrum coefficients LPC Analysis Hamming Windowing
LP Spectrum FFT-based signal Spectrum Spectrum derived from LP-Cepstrum Amplitude (dB) Cepstral Processing Spectrum Hz Pre-processing and Feature Extraction • Spectral Envelop Reconstructed from different feature parameters
Enrollment RBF Network EBF Network Feature vectors Feature vectors K-means K-means Covariance analysis or EM K-nearest neighbor Function centers Linear regression Linear regression Covariance matrices Function widths Output weights W
Enrollment 0(Bias) Output weights Background speakers´centers Speaker centers x1 x2 xD Input (Feature vectors)
Verification ^ y(x) + + - Averaging Averaging Softmax x1 x2 xD
Verification • Distributions of the average network outputs RBF EBF
Verification • Error rates against decision threshold
Verification Results (TIMIT) Number of centers per network
Verification Results • Decision Boundaries EBF (diagonal cov. Matrices) EBF (full cov. Matrices)
Conclusion • EBF networks with full covariance matrices trained with the EM algorithm outperform the ones whose basis function parameters are estimated by the k-means algorithm and sample covariance. • RBF networks are found to be the poorest performer in terms of verification accuracy.
Conclusion • EBF networks with full covariance matrices achieve the lowest error rates when networks with the same number of free parameters are compared.
4. Bonus Materials: Scoring Normalization for Speaker Verification
Purpose of Scoring Normalization Speaker model of claimed ID Sc Speech with claimed speaker ID X Feature extraction - Imposter Models Normalization Term
Purpose of Scoring Normalization > Threshold Accept the claimant If log L(X) Threshold Reject the claimant Prob. x1 (Accept) x2 (Reject) x
EBFN-based normalization Speaker centers Anti-speaker centers Speaker models: Elliptical basis function networks (EBFN)
References: [1] Mak, M.W. and Kung, S.Y. (2000). "Estimation of elliptical basis function parameters by the EM algorithms with application to speaker verification," IEEE Trans. on Neural Networks, Vol. 11, No. 4, pp. 961-969. [2] Yiu, K.K., Mak, M.W. and Li, C.K. (1999), “Gaussian mixture models and probabilistic decision-based neural networks for pattern classification: A comparative study," Neural Computing and Applications, 8, 235-245. [3] Zhang, W.D. Mak, M.W. and He, M.X. (2000). "A two-stage scoring method combining world and cohort models for speaker verification," Proc. ICASSP, Vol. 2, pp. 1193-1196, 2000. [4] Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8 (1), pp. 114-132. [5] Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons and Radial Basis Functions Networks,” Neurocomputing, 6 (1), 99-118.