Speaker Verification: Is it Industrial Strength?

Speaker Verification:Is it Industrial Strength? Fergus McInnes - CCIR

Outline of Presentation • Introduction to speaker verification: • what it is • how it works • Large-scale evaluation of Nuance Verifier on British speakers • Conclusions and recommendations

Speaker Verification:What is it? • Test of claimed identity based on voice • Aims to answer question: “Is the speaker who he/she claims to be?” • Can be used in combination with other security checks • Compare and contrast: • speaker identification - no prior identity claim • other biometrics- not usable over phone

Speaker Verification:How does it work? • Enrolment phase: client provides speech to build speaker model • Verification phase: • compare caller’s speech with client’s model • if it matches well enough, accept caller as being the client (assuming other security checks OK) • otherwise reject caller’s identity claim

Tuning a Verifier EqualErrorRate(EER) Lower security:low FR (high FA) Higher security:low FA (high FR) EER threshold

Enrolment Phase • How many enrolment sessions? - single session per client for convenience, or multiple sessions for improved modelling • What words or phrases to use? - digits, application keywords, passwords etc • How many utterances per word or phrase? - should be more than one to ensure consistency and representativeness

Verification Phase • What words or phrases? - for best results use same words as in enrolment • How many utterances per verification bid? - more for accuracy, or fewer for speed • How many verification bids to allow? • What threshold setting? - tradeoff between false acceptances and false rejections • What to do on rejection? - pass to agent?

Large-Scale Evaluationof Nuance Verifier • Speech collected over telephone network from >1000 employees of participating companies • Tests run at CCIR using Nuance Verifier Version 6.2.4-pre (1999/2000): • main series: 779 speakers with sufficient data • comparative tests on excluded speakers • tests on identical twins and other related speakers • Edinburgh impostor study (deliberate mimicry)

Data Collection

Main Series of Tests • 779 speakers (445 male, 334 female) • Digits, digit strings, names, banking words • Enrolment data: 1, 2 or 3 utterances per word or phrase, from registration call • Test data: 1 utterance per word or phrase from each of 3 simulated banking service calls • Equal numbers of client and impostor bids • Random or same-sex impostors

Test Procedure SPEAKER SET X: SPEAKER SET Y: enrolment enrolment client andimpostor bids client andimpostor bids thresholdestimation thresholdestimation scores scores verification decisions verification decisions FA and FR rates HTER(X) HTER(Y) FA and FR rates averaging averaging averaging overall Half Total Error Rate (HTER)

Results on Digits(HTER with random impostors)

False Rejection / False Acceptance Tradeoff(all digit phrases, pooled ×3 enrolment)

Results on Non-Digit Words(HTER with random impostors)

Results on All Words(HTER with random impostors)

Results with Adaptation(HTER, with pooled 3 enrolment) • After adaptation on client bids:1.2% (no ad)  0.9% (1 ad)  0.9% (2 ad) (using all digit phrases) • After adaptation on client and same-impostor bids: 1.2%  1.4%  1.7% • Adapted digit + unadapted non-digit models:client ad: 0.9%  0.8%  0.7%client + same-imp ad: 0.9%  1.3%  1.5%

Related Impostors(FA on all digits, with pooled 3 enrolment and standard same-sex EER threshold) • Identical twin: 58.7% [19 speaker pairs] • Other sibling: 0.0% [10] • Parent: 15.5% [11] • Child: 11.1% [10] ( Unrelated same-sex impostor: 1.4% [779] ) HTER on identical twins with adjusted threshold: 17.1%

Impostor Study- Deliberate Mimicry(combined digit/non-digit bids, pooled 3 enrolment, standard same-sex threshold)

Summary of Results • Accuracy improved by usingdigits, byincreasing amount of enrolment data, and by increasing amount of speech for verification • EER ~1% for verification on 10s of speech, after enrolment on 3 utterances per word or phrase • False Acceptance/False Rejection tradeoff:0.1% FR  5% FA • Adaptation helps if there are no persistent impostors • Identical twins not reliably distinguished

Speaker Verification - Conclusions • Recommend using speaker verificationin addition to other security checks - SV with 1% false acceptance reduces an impostor’s chance from 1% to 0.01% (PIN not known) or from 100% to 1% (PIN known) • Adding verification will always increase security, at the cost of some rate of false rejection for genuine speakers

Fergus.McInnes@ccir.ed.ac.uk

Speaker Verification: Is it Industrial Strength?

Speaker Verification: Is it Industrial Strength?

Presentation Transcript

“Integrity Is Our Strength”

A Tutorial on Text-Independent Speaker Verification

Speaker Verification Using Series of LVQ Networks

Speaker Verification

Automatic Verification of Industrial Designs

Speaker Identification and Verification

Speaker Verification

Speaker Verification System Part B Final Presentation

Speaker Verification

Unity is strength

Flexibility is our Strength

Jagger Industrial-Strength Performance Testing

A NONLINEAR MIXTURE AUTOREGRESSIVE MODEL FOR SPEAKER VERIFICATION

Education is Our Strength

His Strength is Perfect

VQ speaker verification with sentence codebook

Industrial Strength SAT-based Alignability Algorithm for Hardware Equivalence Verification

Speaker Verification via Kernel Methods

Toward Verification Closure from industrial perspective

Speaker Verification: From Research to Reality

SPEAKER VERIFICATION USING SUPPORT VECTOR MACHINES

Speaker Verification