130 likes | 279 Views
Speaker Verification System using SVM. Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering. Outline – Summary of Ph.d Dissertation of Vincent Wan. Speaker verification system Extracting features
E N D
Speaker Verification System using SVM Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering
Outline – Summary of Ph.d Dissertation of Vincent Wan • Speaker verification system • Extracting features • Creating models of speakers • Generative models, discriminative models • Making generative models discriminative • Developing speaker verification using SVMs • My interest to improve our system.
Speaker verification system • Authenticate a person’s claimed identity • Text dependent and independent • The system models the sound of the client’s voice. (based on physical characteristics of the client’s vocal tract.) • Feature extraction • Enrolment • Creates a model for client’s voice • Pattern matching • Decision theory A generic speaker verification system
Extracting features • Building models of speakers depends on frequency analysis of the speaker’s voice. • Linear predictive coding (LPC) • LPC assumes that speech can be modelled as the output of periodic pulses or random noise. • The solutions for these LPC coefficients is obtained by minimizing MSE. • Perceptual linear prediction (PLP) • PLP combines LPC analysis with psychophysics knowledge of the human auditory system. • Ex: Human ear has a higher frequency resolution at low frequencies.
Creating models of speakers • Generative models • Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) • Models are probability density estimators that attempt to capture all of the fluctuations and variations of the data. • Discriminative models • Polynomial classifiers, Support Vector Machines (SVM) • Models are optimized to minimize the error on a set of training samples. • Models draw the boundary between classes and ignores the fluctuations within each class. • Generative models discriminative • Generative models use to estimate the within class probability densities and do not minimize a classification error. • Discriminative models achieves the highest performance in classification tasks.
Making generative models discriminative • GMM-LR/SVM combination • GMM likelihood ratio • Bengio proposed that the probability estimates are not perfect and a better version would be • Bayes decision rule • The input to the SVM is the two dimensional vector made up of the log likelihoods of the client and world models. • A limitation of these approaches arises from frame basis discrimination.
Importance of kernels • Early SVM using polynomial and RBF kernels • Optimization problems requiring significant computational resources that were unsustainable. • Employing cluster algorithms to reduce the accuracy. • Frame level training inputs discard the useful speaker classification information. • SVM using score-space kernels • The variable length of utterance can be classified by sequence level.
Classifying sequences using score-space kernels • The score-space kernel enables SVMs to classify whole sequences. • A variable length sequence of input vectors is mapped explicitly onto a single point in a space of fixed dimension. • The score-space is derived from the likelihood score. • The likelihood ratio score-space
Computing the score-space vectors Define the global likelihood of a sequence X = {x1, …, xNl}
Computing the score-space vectors • The fixed length vectors of the likelihood ration kernel can be expressed as • The final likelihood ratio kernel is • The dimensionality of the score-space is equal to the total number of parameters in the generative models. Hence the SVM can classify the complete utterance sequences.
Experiment Results on PolyVar • The data has a noise. • The data has a much more clients tests than YOHO.
Conclusion • Add GMM-LR/SVM model in our verification system • Add score-space kernel on SVM • Need to compare the computation requirement for Fisher and LR kernels.
References • V. Wan, Speaker Verification using Support Vector Machines, University of Sheffield, June 2003 • V. Wan, Building Sequence Kernels for Speaker Verificaiton and Speech Recognition, University of Sheffield • S. Bengio, and J. Marithoz, Learning the Decision Function for the Speaker Verification, IDIAP, 2001