Overview

Combining Gaussian Mixture Models with Support Vector Machines for Text-independent Speaker Verification Jamal Kharroubi Dijana Petrovska-DelacrétazGérard Chollet(kharroub, petrovsk,chollet)@tsi.enst.fr Eurospeech, 3-7 September 2001

Overview • Motivations • SVM principles • Previous experiment with SVMs for speaker identification • Our method : combine GMMs and SVMs for speaker verification • Database • Experimental protocol • Results • Conclusions and perspectives

Motivations • Gaussian Mixture Models (GMM) • State of the art for speaker verification • Support Vector Machines (SVM) • New and promising technique instatistical learning theory • Discriminative method • Good performance in image processing and multi-modal authentication • Seem to be "à la mode" at Eurospeech 2001 • Combine GMM and SVM for Speaker Verification

SVM Principles • Pattern classification problem :given a set of labelled training data, learn to classify unlabelled test data • Finding decision boundaries that separate the classes, minimising the number of classification errors • SVM are : • Binary classifiers • Capable of determining automatically the complexity of the decision boundary

Separating hyperplane H , with the optimal hyperplane Ho Feature space Input space H y(X) X Class(X) Ho SVM principles, cntd.

SVM Theory Feature Space Input Space Classification Function

Previous experiment with SVM for Speaker Recognition Speaker Identification with SVM : Schmidt and Gish, 1996 • Goal : identify one among a given closed set of speakers • The input vectors of the SVM’s are the spectral parameters • Slightly better performance with SVM’s, with the pairwize classifier • Why these disapointing results? • Too short train/test durations • GMM’s perhaps better suited to model the data • GMM’s perhaps more robust to channel variation

SVM and Speaker Verification • Difficulty : mismatch of the quantity of labelled data,more data available for impostor access than true target • Our preliminary test, with speech feature frames as input to SVM => no satisfactory results • Present approach : • Combine GMM and SVM

Our method of combining GMM and SVM for Speaker Verification • Usual GMM Modeling • Need additional labelled development set for the SVM • Construction of the SVM input vector, with the GMM models • Model globally with the SVM, the client-client against the client-impostor access (with the development data)

WORLDGMMMODEL GMMMODELING WORLD DATA Front-end TARGETGMMMODEL TARGET SPEAKER GMM modellingor adaptation Front-end GMM speaker modeling

HYPOTH.TARGETGMM MOD. Front-end WORLDGMMMODEL Baseline GMM method l Test Speech = LLR SCORE

The score St is added to the ith component of Test segment If If The score St is added to the ith component of Frame t Construction of the SVM input vectors SVM(max) method HYPOTH.TARGETGMM MOD. WORLDGMMMODEL

Construction of the SVM input vectors • The input SVM vector is the concatenation of • Summation and normalization of the SVM input vector by the number of frames of the test segment T

Train Client class Impostor class SVMCLASSIFIER ... ... Test SVMINPUT VECTORCONSTRUCTION Test speech Decision score SVM : Train /Test

Database Nist’99 data splitted in : • Development data = 100 speakers • 2min GMM model • Corresponding test data to train the SVM classifier(519 true and 5190 impostor accesses) • World data = 200 speakers • 4 sex/handset dependent world models • Evaluation data = 100 speakers =449 true and 4490 impostor accesses

Experimental Protocol : Feature Extraction • LFCC parametrization (32.5 ms windows every 10 ms) • Cepstral mean substraction for channel compensation • Feature vector dimension is 33 (16 cep, 16 dcep,  log E)(Delta cepstral features on 5-frames windows) • Frame removal algorithm applied onfeature vectors to discard non significant frames (bimodal energy distributions)

Experimental Protocol cont. • GMM Speaker and world models • with 128 mixtures • diagonal covariance matrix • Standard EM algorithm with a max. of 20 iterations • ML and MAP method to construct the client mode • With ELISA Consortium Toolkit • SVM model was trained using a part of the development corpus • Linear kernel is used • With svmFu Package

GMM(ml) / hybrid GMM(ml)+ SVM(max) no normalization

GMM(map) / GMM(ml) No normalization

GMM(map) / GMM(map)+SVM(max) No normalization

Why no improvement when GMM(map)+SVM(max) • ML training -Speaker model trained independently of the World model • MAP training - derive Speaker's model by updating the World parameters via adaptation • Stronger coupling of Speaker <=> World models

The score St is added to the ith components of Test segment Frame t New SVM input vector Representation: SVM(maxRatio) HYPOTH.TARGETGMM MOD. WORLDGMMMODEL

GMM(map)+SVM(max)GMM(map-maxRatio)GMM(map) No normalization GMM(map)+SVM(maxRatio)

Conclusions and Perspectives • Improvements with our hybrid GMM+SVM are dependant on the GMM adaptation method • Better(or similar) results with GMM+SVM method if appropriate SVM input vector representation choosen • Different kernel types and features will be experimented • Normalization techniques will be applied

Overview

Overview

Presentation Transcript

Overview

Overview

OVERVIEW

Overview

Overview

Overview

Overview

Overview

overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview