1 / 18

3. Applications to Speaker Verification

3. Applications to Speaker Verification. Outline of the presentation. 3. Applications to Speaker Verifications 3.1 Feature extraction 3.2 Speaker models 3.3 Scoring normalization 3.4 Video demo. Pre-processing and Feature Extraction. A/D Converter. cepstrum and delta cepstrum

rcowley
Download Presentation

3. Applications to Speaker Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3. Applications to Speaker Verification

  2. Outline of the presentation 3. Applications to Speaker Verifications 3.1 Feature extraction 3.2 Speaker models 3.3 Scoring normalization 3.4 Video demo

  3. Pre-processing and Feature Extraction A/D Converter cepstrum and delta cepstrum coefficients LPC Analysis Hamming Windowing

  4. LP Spectrum FFT-based signal Spectrum Spectrum derived from LP-Cepstrum Amplitude (dB) Cepstral Processing Spectrum Hz Pre-processing and Feature Extraction • Spectral Envelop Reconstructed from different feature parameters

  5. Enrollment RBF Network EBF Network Feature vectors Feature vectors K-means K-means Covariance analysis or EM K-nearest neighbor Function centers Linear regression Linear regression Covariance matrices Function widths Output weights W

  6. Enrollment 0(Bias) Output weights  Background speakers´centers Speaker centers x1 x2 xD Input (Feature vectors)

  7. Verification ^ y(x) + + - Averaging Averaging Softmax   x1 x2 xD

  8. Verification • Distributions of the average network outputs RBF EBF

  9. Verification • Error rates against decision threshold

  10. Verification Results (TIMIT) Number of centers per network

  11. Verification Results • Decision Boundaries EBF (diagonal cov. Matrices) EBF (full cov. Matrices)

  12. Conclusion • EBF networks with full covariance matrices trained with the EM algorithm outperform the ones whose basis function parameters are estimated by the k-means algorithm and sample covariance. • RBF networks are found to be the poorest performer in terms of verification accuracy.

  13. Conclusion • EBF networks with full covariance matrices achieve the lowest error rates when networks with the same number of free parameters are compared.

  14. 4. Bonus Materials: Scoring Normalization for Speaker Verification

  15. Purpose of Scoring Normalization Speaker model of claimed ID Sc Speech with claimed speaker ID X Feature extraction - Imposter Models Normalization Term

  16. Purpose of Scoring Normalization > Threshold Accept the claimant If log L(X)  Threshold Reject the claimant Prob. x1 (Accept) x2 (Reject) x

  17. EBFN-based normalization Speaker centers Anti-speaker centers Speaker models: Elliptical basis function networks (EBFN)

  18. References: [1] Mak, M.W. and Kung, S.Y. (2000). "Estimation of elliptical basis function parameters by the EM algorithms with application to speaker verification," IEEE Trans. on Neural Networks, Vol. 11, No. 4, pp. 961-969. [2] Yiu, K.K., Mak, M.W. and Li, C.K. (1999), “Gaussian mixture models and probabilistic decision-based neural networks for pattern classification: A comparative study," Neural Computing and Applications, 8, 235-245. [3] Zhang, W.D. Mak, M.W. and He, M.X. (2000). "A two-stage scoring method combining world and cohort models for speaker verification," Proc. ICASSP, Vol. 2, pp. 1193-1196, 2000. [4] Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8 (1), pp. 114-132. [5] Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons and Radial Basis Functions Networks,” Neurocomputing, 6 (1), 99-118.

More Related