1 / 13

Speaker Verification System using SVM

Speaker Verification System using SVM. Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering. Outline – Summary of Ph.d Dissertation of Vincent Wan. Speaker verification system Extracting features

Download Presentation

Speaker Verification System using SVM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Verification System using SVM Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering

  2. Outline – Summary of Ph.d Dissertation of Vincent Wan • Speaker verification system • Extracting features • Creating models of speakers • Generative models, discriminative models • Making generative models discriminative • Developing speaker verification using SVMs • My interest to improve our system.

  3. Speaker verification system • Authenticate a person’s claimed identity • Text dependent and independent • The system models the sound of the client’s voice. (based on physical characteristics of the client’s vocal tract.) • Feature extraction • Enrolment • Creates a model for client’s voice • Pattern matching • Decision theory A generic speaker verification system

  4. Extracting features • Building models of speakers depends on frequency analysis of the speaker’s voice. • Linear predictive coding (LPC) • LPC assumes that speech can be modelled as the output of periodic pulses or random noise. • The solutions for these LPC coefficients is obtained by minimizing MSE. • Perceptual linear prediction (PLP) • PLP combines LPC analysis with psychophysics knowledge of the human auditory system. • Ex: Human ear has a higher frequency resolution at low frequencies.

  5. Creating models of speakers • Generative models • Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) • Models are probability density estimators that attempt to capture all of the fluctuations and variations of the data. • Discriminative models • Polynomial classifiers, Support Vector Machines (SVM) • Models are optimized to minimize the error on a set of training samples. • Models draw the boundary between classes and ignores the fluctuations within each class. • Generative models discriminative • Generative models use to estimate the within class probability densities and do not minimize a classification error. • Discriminative models achieves the highest performance in classification tasks.

  6. Making generative models discriminative • GMM-LR/SVM combination • GMM likelihood ratio • Bengio proposed that the probability estimates are not perfect and a better version would be • Bayes decision rule • The input to the SVM is the two dimensional vector made up of the log likelihoods of the client and world models. • A limitation of these approaches arises from frame basis discrimination.

  7. Importance of kernels • Early SVM using polynomial and RBF kernels • Optimization problems requiring significant computational resources that were unsustainable. • Employing cluster algorithms to reduce the accuracy. • Frame level training inputs discard the useful speaker classification information. • SVM using score-space kernels • The variable length of utterance can be classified by sequence level.

  8. Classifying sequences using score-space kernels • The score-space kernel enables SVMs to classify whole sequences. • A variable length sequence of input vectors is mapped explicitly onto a single point in a space of fixed dimension. • The score-space is derived from the likelihood score. • The likelihood ratio score-space

  9. Computing the score-space vectors Define the global likelihood of a sequence X = {x1, …, xNl}

  10. Computing the score-space vectors • The fixed length vectors of the likelihood ration kernel can be expressed as • The final likelihood ratio kernel is • The dimensionality of the score-space is equal to the total number of parameters in the generative models. Hence the SVM can classify the complete utterance sequences.

  11. Experiment Results on PolyVar • The data has a noise. • The data has a much more clients tests than YOHO.

  12. Conclusion • Add GMM-LR/SVM model in our verification system • Add score-space kernel on SVM • Need to compare the computation requirement for Fisher and LR kernels.

  13. References • V. Wan, Speaker Verification using Support Vector Machines, University of Sheffield, June 2003 • V. Wan, Building Sequence Kernels for Speaker Verificaiton and Speech Recognition, University of Sheffield • S. Bengio, and J. Marithoz, Learning the Decision Function for the Speaker Verification, IDIAP, 2001

More Related