Speaker Verification via Kernel Methods

Speaker Verification via Kernel Methods Speaker : Yi-Hsiang Chao Advisor : Hsin-Min Wang

OUTLINE • Current Methods for Speaker Verification • Proposed Methods for Speaker Verification • Kernel Methods for Speaker Verification • Experiments • Conclusions

What is speaker verification ? • Goal: To determine if a speaker is who he or she claims to be. • Speaker verification is a hypothesis testing problem. • Given an input utterance U, two hypotheses have to be considered as • H0: U is from the target speaker.(the null hypothesis) • H1: U is not from the target speaker.(the alternative hypothesis) • The Likelihood Ratio (LR)test: • Mathematically, H0 and H1 can be represented by parametric models denoted as and , respectively. • is often called an anti-model. (1)

Current Methods for Speaker Verification • is usually ill-defined, since H1 does not involve any specific speaker, and thus lacks explicit data for modeling. • Many approaches have been proposed in attempts to characterize H1: • One simple approach is to train a single speaker-independent model , named the world model or the Universal Background Model (UBM) [D. A. Reynolds, et al., 2000]: • The training data are collected from a great amount of speakers, generally irrelevant to the clients.

Current Methods for Speaker Verification Instead of using a single model, an alternative way is to train a set of cohort models {1, 2,…, B}. This gives the following possibilities in computing LR: • Picking the likelihood of the most competitive model: [A. Higgins, et al., 1991] • Averaging the likelihoods of the B cohort models arithmetically: [D. A. Reynolds, 1995]: • Averaging the likelihoods of the B cohort models geometrically : [C. S. Liu , et al., 1996]:

Current Methods for Speaker Verification • Selection of cohort set • Two cohort selection methods [D. A. Reynolds, 1995] are used: • One selects the B closest speakers to each client. (such as L2, L3, L4) • The other selects the B/2 closest speakers to, plus the B/2 farthest speakers from, each client.(such as L3) • The selection is based on the speaker distance measure [D. A. Reynolds, 1995], computed by where and are speaker models trained using the i-th speaker’s training utterances and the j-th speaker’s training utterances , respectively.

Current Methods for Speaker Verification • The Null Hypothesis Characterization • The client model  is represented by a Gaussian Mixture Model (GMM): •  can be trained via the ML criterion by using the Expectation-Maximization (EM) algorithm. •  can be derived from the UBM using MAP adaptation. (the adapted GMM). • The adapted GMM + L1 measure => we term the GMM-UBM system. [D. A. Reynolds, et al., 2000] • Currently,GMM-UBM is the state-of-the-art approach. • This method is appropriate for the Text-Independent (TI) task. • Advantage: cover unseen data.

Proposed Methods for Speaker Verification • Motivation: • However, none of the LR measures developed so far has proved to be absolutely superior to the others in any tasks and applications. • We propose two perspectives in attempts to better characterize the ill-defined alternative hypothesis . • Perspective 1: • Optimal combination of the existing LRs. • Perspective 2: • On the design of the novel alternative hypothesis characterization.

Perspective 1: The Proposed Combined LR (ICPR2006) • The pros and cons of different LR measures motivate us to try to combine them into a unified framework by virtue of the complementary information that each LR can contribute. • Given N different LR measures Li(U), i = 1, 2,…, N. We define a combined LR measure by (2) where x = [L1(U), L2(U),…, LN(U)]T is an N× 1 vector in the space RN, w = [w1, w2,…, wN]T is an N× 1 weight vector, and b is a bias.

Linear Discriminant Classifier • forms a so-called linear discriminant classifier. • This classifier translates the goal of solving an LR measure into the optimization of w and b, such that the utterances of clients and impostors can be separated. • To realize this classifier, three distinct data sets are needed: • One for generating each client’s model. • One for generating each client’s anti-models. • One for optimizing w and b.

Linear Discriminant Classifier • The bias b actually plays the same role as the decision threshold  of the LR defined in Eq. (1). • it can be determined through a trade-off between false acceptance and false rejection, • The main goal here is to find w. • f(x) can be solved via linear discriminant training algorithms, such as: • Fisher’s Linear Discriminant (FLD). • Linear Support Vector Machine (Linear SVM) . • Perceptron.

Linear Discriminant Classifier • Using Fisher’s Linear Discriminant (FLD) • Suppose the i-th class has ni data samples, , i = 1, 2. • The goal of FLD is to seek a direction w such that the following Fisher’s criterion function J(w) is maximized: where Sb and Sw are, respectively, the between-class scatter matrix and the within-class scatter matrix defined as where is the mean vectorof the i-th class.

Linear Discriminant Classifier • Using Fisher’s Linear Discriminant (FLD) • The solution for w, which maximizes the Fisher’s criterion J(w), is the leading eigenvector of . • w can be directly calculated as (3)

Analysis of the Alternative Hypothesis • The LR approaches that have been proposed to characterize H1 can be collectively expressed in the following general form : where F() is some function of the likelihood values from a set of so-called background models {1,2,…,N}. • For example, F() can be • the average function for L3(U), the maximum for L2(U) or the geometric mean for L4(U), and the background model set here can be obtained froma cohort. • A special case arises when F() is an identity function and N = 1. In this instance, a single background model is used for L1(U). (4)

where is an N×1 vector and is the weight of the likelihood p(U | i), i = 1,2,…, N. Perspective 2: The Novel Alternative Hypothesis Characterization (submitted to ISCSLP2006) • We redesign the function F() as • This function gives N background models different weights according to their individual contribution to the alternative hypothesis. • It is clear that Eq. (5) is equivalent to a geometric mean function when • It is also clear that Eq. (5) will reduce to a maximum function when (5)

where is an N×1 weight vector and x is an N× 1 vector in the space RN, expressed by Perspective 2: The Novel Alternative Hypothesis Characterization (submitted to ISCSLP2006) By substituting Eq. (5) into Eq. (4) and letting (6) (7)

Perspective 2: The Novel Alternative Hypothesis Characterization (submitted to ISCSLP2006) • The implicit idea in Eq. (7) is that the speech utterance U can be represented by a characteristic vector x. • If we replace the threshold in Eq. (6) with a bias b, the equation can be rewritten as • Analogous to the combined LR method in Eq. (2). • f(x) in Eq. (8) forms a linear discriminant classifier again, which can be solved via linear discriminant training algorithms, such as FLD. (8)

Perspective 2: The Novel Alternative Hypothesis Characterization (submitted to ISCSLP2006) • Relation to Perspective 1: The combined LR measure If the anti-models are instead of the background models for the characteristic vector x defined in Eq. (7): We obtain f(x)forms a linear combination of N different LR measures, which is the same as the combined LR measure.

Kernel Methods for Speaker Verification • can be solved via linear discriminant training algorithms. • However, such methods are based on the assumption that the observed data of different classes is linearly separable. • It is obviously not feasible in most practical cases with nonlinearly separable data. • From this point of view, we hope • The data from different classes, which is not linearly separable in the original input space RN. • They can be separated linearly in a certain implicit higher dimensional (maybe infinite) feature space F via a nonlinear mapping Φ. • Let Φ(x) denote a vector obtained by mapping x from RN to F.f(x)can be re-defined as (9) which constitutes a linear discriminant classifier in F.

Kernel Methods for Speaker Verification • In practice, it is difficult to determine the kind of mapping Φ that would be applicable. • Therefore, the computation of Φ(x) can be infeasible. • We propose using the kernel method: • It is to characterize the relationship between the data samples in F, instead of computing Φ(x) directly. • This is achieved by introducing a kernel function: which is the inner product of two vectors Φ(x) and Φ(y) in F. (10)

Kernel Methods for Speaker Verification • The kernel function k() must be symmetric, positive definite and conform to Mercer’s condition. • For example: • The dot product kernel : • The d-th degree polynomial kernel : • The Radial Basis Function (RBF) kernel : • Existing kernel-based classification techniques can be applied to implement . such as : • Support Vector Machine (SVM). • Kernel Fisher Discriminant (KFD). σ is a tunable parameter.

Kernel Methods for Speaker Verification • Support Vector Machine (SVM) • Techniques based on SVM have been successfully applied to many classification and regression tasks. • Conventional LR: • If the probabilities are perfectly estimated (which is usually not the case), then the Bayes Decision rule is the optimal decision. • However, a better solution should in theory be to use a discriminant framework [V. N. Vapnik, 1995]. • [S. Bengio, et al., 2001] proposed that the probability estimates are not perfect and that a betterversion would be, where a1 ,a2 and b are adjustable parameters estimated using an SVM.

Kernel Methods for Speaker Verification • Support Vector Machine (SVM) • [S. Bengio, et al., 2001] incorporated the two scores obtained from GMM and UBM with an SVM. • Compare with our approach: • [S. Bengio, et al., 2001] only used one simple background model, the UBM, as the alternative hypothesis characterization. • Our approach is considered to integrate multiple background models for the alternative hypothesis characterization in a more effective and robust way:

Kernel Methods for Speaker Verification y y Support vectors r Optimal margin x x Optimal hyperplane (a) (b) Classifier in (b) has greater separation distance than (a) • Support Vector Machine (SVM) • The goal of SVM is to seek a separating hyperplane in the feature space F that maximizes the margin between classes.

Kernel Methods for Speaker Verification • Support Vector Machine (SVM) • Following the theory of SVM, w can be expressed as which yields where each training sample xj belongs to one of the two classes identified by the label yj{1,1}, j=1, 2,…, l.

Kernel Methods for Speaker Verification • Support Vector Machine (SVM) • LetT = [1, 2,…, l]. Our goal now changes from finding w to finding . • We can find the coefficients j by maximizing the objective function, subject to the constraints where C is a penalty parameter. • The above optimization problem can be solved using the quadratic programming techniques.

y Support vectors Optimal margin x Optimal hyperplane Kernel Methods for Speaker Verification • Support Vector Machine (SVM) • Note that mostj are equal to zero, and the training samples with non-zeroj are called support vectors. • A few support vectors act as the key to deciding the optimal margin between classes in the SVM. • An SVM with a dot product kernel function, i.e., is known as a linear SVM.

Kernel Methods for Speaker Verification • Kernel Fisher Discriminant (KFD) • Alternatively, can be solved with KFD. • In fact, the purpose of KFD is to apply FLD in the feature space F. we also need to maximize the Fisher’s criterion: where and are, respectively, the between-class and the within-class scatter matrices in F, i.e., where is the mean vectorof the i-th class in F.

Kernel Methods for Speaker Verification • Kernel Fisher Discriminant (KFD) • Let and . • According to the theory of reproducing kernels, the solution of w must lie in the span of all training data samples mapped in F, w can be expressed as • Accordingly, can be re-written as • LetT = [1, 2,…, l]. Our goal therefore changes from finding w to finding , which maximizes

Kernel Methods for Speaker Verification • Kernel Fisher Discriminant (KFD) where Iniis an ni×ni identity matrix, and 1ni is an ni×ni matrix with all entries 1/ni. The solution for  is analogous to FLD in Eq. (3): which is also the leading eigenvector of N-1M.

Experiments • Formationof the Characteristic Vector • In our methods, we use B+1 background models, consisting of • B cohort set models, • One world model, to form the characteristic vector x. • Two cohort selection methods are used in the experiments: • B closest speakers. • B/2 closest speakers + B/2 farthest speakers. • To yield the following two (B+1)×1 characteristic vectors: where and are, respectively, the i-th closest model and the i-th farthest model of the client model .

Experiments • Detection cost Function (DCF) • The NIST Detection Cost Function (DCF) , which reflects the performance at a single operating point on the DET curve. • The DCF is defined as • and are the miss probability and the false-alarm probability, respectively. • and are the respective relative costs of detection errors. • is the a priori probability of the specific target speaker. • A special case of the DCF is known as the Half Total Error Rate (HTER), where and are both equal to 1, and = 0.5, i.e.,

Experiments - XM2VTSDB • “Training” subset to build the individual client’s model and anti-models. • “Evaluation” subset to estimate  ,w and b. • “Test” subset for the performance evaluation. 1. “0 1 2 3 4 5 6 7 8 9”. 2. “5 0 6 9 2 8 1 3 7 4”. 3. “Joe took father’s green shoe bench out”.

Experimental results (ICPR2006) • XM2VTSDB For perspective 1: • The proposed combined LR Further analysis of the results via the equal error rate (EER) showed that a 13.2% relative improvement was achieved by KFD (EER = 4.6%), compared to 5.3% of L3(U). Figure 1. Baselines vs. the Combined LRs : DET curves for “Test”.

Experimental results (submitted to ISCSLP2006) • XM2VTSDB • For perspective 2: The novel alternative hypothesis characterization A 30.68% relative improvement was achieved by KFD_w_20c, compared to L3_10c_10f – the best baseline system.

Experimental results (submitted to ISCSLP2006) • XM2VTSDB For perspective 2: • The proposed novel alternative hypothesis characterization Figure 2. Best baselines vs. our proposed LRs : DET curves for “Test” subset.

Evaluation on the ISCSLP2006-SRE database • For perspective 2: • The proposed novel alternative hypothesis characterization. • In the text-independent speaker verification task. • We observe that KFD_w_50c_50f achieved a 34.08% relative improvement over GMM-UBM. with

Evaluation on the ISCSLP2006-SRE database • We participated in the text-independent speaker verification task of the ISCSLP2006 Speaker Recognition Evaluation (SRE) plan. • The evaluation results are given as follows

Conclusions • We have introduced current LR systems for speaker verification. • We have presented two proposed LR systems: • The combined LR system. • The new LR system with the novel alternative hypothesis characterization. • Both proposed LR systems can be formulated as a linear or non-linear discriminant classifier. • Non-linear classifiers can be implemented by using kernel methods: • Kernel Fisher Discriminant (KFD) • Support Vector Machine (SVM) • Experiments conducted on two speaker verification tasks • The XM2VTSDB task • The ISCSLP2006-SRE task • The superiority of our methods over conventional approaches.

THANK YOU!

Speaker Verification via Kernel Methods

Speaker Verification via Kernel Methods

Presentation Transcript

Speaker Verification

Speaker Identification and Verification

Speaker Verification

Speaker Verification

Kernel Methods Part 2

Overview of Kernel Methods

Kernel Methods: Basics

Kernel Methods and SVM’s

Kernel Methods

Kernel Methods

Kernel methods

Kernel synchronization methods

Kernel – Based Methods

Kernel Methods

Speaker Verification

Kernel Methods

Kernel Density Estimation, Kernel Methods, and fast learning

Kernel Methods

Speaker Verification System using SVM

Kernel methods - overview

Chapter 6: Kernel Methods

Speaker Identification and Verification