180 likes | 295 Views
Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors. Chengyu Dong France Telecom R&D Beijing 2008-01-21. Outline. Introduction HMM supervectors Normalized scores using SI HMM supervectors Experimental results Conclusions. Introduction.
E N D
Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors Chengyu Dong France Telecom R&D Beijing 2008-01-21
Outline • Introduction • HMM supervectors • Normalized scores using SI HMM supervectors • Experimental results • Conclusions
Introduction • Subword based HMM is state-of-the-art technology for text dependent speaker verification (TDSV) system. • Support vector machines (SVM) using GMM supervector linear (GSL) kernel has proven to be an effective method for text-independent tasks. • Both two popular techniques inspire ideas and methods in research for TDSV tasks.
HMM Baseline Systems • Forced alignment A test utterance with observation is firstly segmented into N segments , where frame to frame are belonging to the ith phone. • Phone LLR scores: • Final verification score:
SVM is a two-class classifier. It is another well- used and powerful modeling method. In the standard formulation, a SVM, , is given by Support Vector Machines • Each speaker is modeled by a set of support vectors to form a two-class hyperplane as the figure shows below.
HMM Supervectors The Block Diagram of HMM Supervectors Extraction
HMM Supervectors • Kullback-Leibler divergence (KLD) of two HMM models and is defined as: • Finally deduce a conclusion A good upper bound estimation
HMM Supervectors • Linear kernel: • Dynamic Time Alignment Kernel: Subject to: Optimization function:
HMM Supervectors • Linear kernel function: • Nonlinear kernel function:
Normalized scores using SI HMM supervectors • The concept of normalizing SVM score comes from zero normalization (Z-Norm). • The SVM discriminant function can be summarized as: • Normalization form: The HMM supervector derives from the background SI HMMs
Normalized scores using SI HMM supervectors • The reason why we use normalized score is the lack of training data Suppose: denotes the dimensions which are adapted is the remaining part of SI HMM means. Some part of dimensions are not adapted. Therefore SI HMM mean vectors remain in the supervector.
Normalized scores using SI HMM supervectors • Un-normalized SVM scores: • Normalized SVM scores: No discrimination only shift the input dimension space removes the nuisance of supervectors
Experimental Results • 134 speakers involved in the evaluations. There are total 5292 target trials and 7840 imposter trials. Each participant is required to utter one password twice. The imposters were assumed to know the exact password of the target speaker. • SD HMMs is constructed by MAP adaptation with relevance factor to 1. Context-independent phone units are used as a universal phone set. • The acoustic features used in our system are the first 12 PLP coefficients together with the log-energy of each frame which are calculated every 10 ms using a 25ms Hamming window. The features are processed through a RASTA channel equalization filter. By including the first and the second derivatives over ±2 frame span, 39-dimensional feature vectors were finally used.
Experimental Results System fusions is a weighting factor determined a discriminant analysis procedure like LDA which follows the Fisher's discrimination criterion. System fusions on HMMs, GMMs and SVMs
Experimental Results 3-D distribution of the scores for target and imposter trials (HMM, GMM and SVM scores) 2-D distribution of the scores for target and imposter trials (HMM and SVM scores)
Conclusions • SVMs using HMM supervectors provide another evidence for TDSV systems. • DTA kernel performs a little better than the linear kernel, but requires too much computational cost. • Normalized output score can remarkably improve the performance of the SVM system. • Fusion of HMM and SVM yields excellent results. EER is reduced from 4.01% to 3.47%. • When incorporates the fusion system, EER is further reduced to 2.95%.