210 likes | 420 Views
Speaker Verification: Is it Industrial Strength?. Fergus McInnes - CCIR. Outline of Presentation. Introduction to speaker verification: what it is how it works Large-scale evaluation of Nuance Verifier on British speakers Conclusions and recommendations. Speaker Verification: What is it?.
E N D
Speaker Verification:Is it Industrial Strength? Fergus McInnes - CCIR
Outline of Presentation • Introduction to speaker verification: • what it is • how it works • Large-scale evaluation of Nuance Verifier on British speakers • Conclusions and recommendations
Speaker Verification:What is it? • Test of claimed identity based on voice • Aims to answer question: “Is the speaker who he/she claims to be?” • Can be used in combination with other security checks • Compare and contrast: • speaker identification - no prior identity claim • other biometrics- not usable over phone
Speaker Verification:How does it work? • Enrolment phase: client provides speech to build speaker model • Verification phase: • compare caller’s speech with client’s model • if it matches well enough, accept caller as being the client (assuming other security checks OK) • otherwise reject caller’s identity claim
Tuning a Verifier EqualErrorRate(EER) Lower security:low FR (high FA) Higher security:low FA (high FR) EER threshold
Enrolment Phase • How many enrolment sessions? - single session per client for convenience, or multiple sessions for improved modelling • What words or phrases to use? - digits, application keywords, passwords etc • How many utterances per word or phrase? - should be more than one to ensure consistency and representativeness
Verification Phase • What words or phrases? - for best results use same words as in enrolment • How many utterances per verification bid? - more for accuracy, or fewer for speed • How many verification bids to allow? • What threshold setting? - tradeoff between false acceptances and false rejections • What to do on rejection? - pass to agent?
Large-Scale Evaluationof Nuance Verifier • Speech collected over telephone network from >1000 employees of participating companies • Tests run at CCIR using Nuance Verifier Version 6.2.4-pre (1999/2000): • main series: 779 speakers with sufficient data • comparative tests on excluded speakers • tests on identical twins and other related speakers • Edinburgh impostor study (deliberate mimicry)
Main Series of Tests • 779 speakers (445 male, 334 female) • Digits, digit strings, names, banking words • Enrolment data: 1, 2 or 3 utterances per word or phrase, from registration call • Test data: 1 utterance per word or phrase from each of 3 simulated banking service calls • Equal numbers of client and impostor bids • Random or same-sex impostors
Test Procedure SPEAKER SET X: SPEAKER SET Y: enrolment enrolment client andimpostor bids client andimpostor bids thresholdestimation thresholdestimation scores scores verification decisions verification decisions FA and FR rates HTER(X) HTER(Y) FA and FR rates averaging averaging averaging overall Half Total Error Rate (HTER)
False Rejection / False Acceptance Tradeoff(all digit phrases, pooled ×3 enrolment)
Results with Adaptation(HTER, with pooled 3 enrolment) • After adaptation on client bids:1.2% (no ad) 0.9% (1 ad) 0.9% (2 ad) (using all digit phrases) • After adaptation on client and same-impostor bids: 1.2% 1.4% 1.7% • Adapted digit + unadapted non-digit models:client ad: 0.9% 0.8% 0.7%client + same-imp ad: 0.9% 1.3% 1.5%
Related Impostors(FA on all digits, with pooled 3 enrolment and standard same-sex EER threshold) • Identical twin: 58.7% [19 speaker pairs] • Other sibling: 0.0% [10] • Parent: 15.5% [11] • Child: 11.1% [10] ( Unrelated same-sex impostor: 1.4% [779] ) HTER on identical twins with adjusted threshold: 17.1%
Impostor Study- Deliberate Mimicry(combined digit/non-digit bids, pooled 3 enrolment, standard same-sex threshold)
Summary of Results • Accuracy improved by usingdigits, byincreasing amount of enrolment data, and by increasing amount of speech for verification • EER ~1% for verification on 10s of speech, after enrolment on 3 utterances per word or phrase • False Acceptance/False Rejection tradeoff:0.1% FR 5% FA • Adaptation helps if there are no persistent impostors • Identical twins not reliably distinguished
Speaker Verification - Conclusions • Recommend using speaker verificationin addition to other security checks - SV with 1% false acceptance reduces an impostor’s chance from 1% to 0.01% (PIN not known) or from 100% to 1% (PIN known) • Adding verification will always increase security, at the cost of some rate of false rejection for genuine speakers