300 likes | 308 Views
This chapter provides an introduction to evaluating biometric systems, including measuring performance and error rates. It covers technology evaluations, scenario evaluations, operational evaluations, and the comparison of methods.
E N D
Performance Testing“Guide to Biometrics” - chapter 7“An Introduction to Evaluating BiometricSystems” by Phillips et al., IEEE Computer, February 2000, pp 56-63 Presented By: Xavier Palathingal September 21st, 2005
Background • Two biometric capabilities( matching and ranking) and biometric system errors • Chapter 5 – 1:1 Biometric Matching • Chapter 6 – 1:m Biometric Searching • Relate “error quotes” to error definitions • Look at accuracy numbers and reconstruct and interpret them
Overview • Measuring performance • Technology Evaluations • Scenario Evaluations • Operational Evaluations • Comparison of the methods • Limits to Evaluations
Overview (cont…) • Implications of error rates • Biometric Authentication - “Why does it reject me?” • Biometric Screening – “Why does it point to me?” • Face, Finger and Voice • Iris and Hand geometry • Signature • Summary of verification accuracies
Overview (cont…) • Identification System Testing • Biometric data and ROC and CMC • Biometric search engines • 1:m search engine testing • Face Recognition and Verification Test 2000 [ FVRT 2000 ] • FVRT 2002 • Face, Finger and Voice
Measuring Performance Evaluation protocols • Why measure performance ? • Determines how you test the system, select the data and measure performance • Evaluation shouldn’t be too hard or too easy • Is just right when it spreads performance over a range that lets to distinguish
Measuring Performance 1 -Technology Evaluations • On laboratory or prototype algorithms • “testing on databases” • Move from general to specific • “training” data A • Sequestered “test” data Q • Two phases • Training phase • Competitive testing phase
Phase 1 of Technology Evaluation-Training phase • The algorithm is trained using “training” data A = (A1 U A2 U …) • Then tested on newly made available sequestered “test” data Q
Phase 2 of Technology Evaluation - Competitive testing phase • Using database D for each matcher Z, a set of match (genuine) scores X={X1,X2,…,XN} and a set of non-match scores Y={Y1,Y2,….,YN} are generated. • FMR and FNMR [FAR and FRR] are calculatedas a function of threshold T
Measuring Performance 2 - Scenario Evaluations • Tests complete biometric systems under conditions that model real world applications • Combination of sensors and algorithms • “office environment”, “user tests”
Measuring Performance3 - Operational Evaluations • Similar to scenario evaluations • Scenario test – class of applications • Operational test – specific algorithm for a specific application • Performed at the actual site • Using actual subjects/areas • Usually not very repeatable
Comparison of methods • Academia tend to use databases i.e.; technology evaluations • acquisition procedures • user population is closed in scenario evaluations • Not “double blind” – technology and scenario
Limits to evaluation • Biometric authentication should be mandatory to the whole user population • User population should be fairly represented • Subjects should be unaware of the matching decision • Only realistic form of testing is operational evaluation • One cannot measure the true FAR or true FNR – nobody except the actual subject knows • Attempt to measure these “hidden” system parameters will be by trying to defeat the biometric system
Implication of error ratesBiometric Authentication “Why does it reject me?” • Verification protocol – frequent flyer smartcard with biometric - fingerprint template on a smartcard - unique frequent flyer no. and smartcard - FRR = 3% (typical for finger) - 5000 people per hr [Newark airport] in a 14 hr day .03 x 5000 x 14 = 2100 - will have to handled through exception handling procedures
Implication of error ratesBiometric Screening “Why does it point to me?” • Screening protocol – passenger face images with government face image database - a system that checks a face against a negative database N of n=25 alleged terrorists - FPR = 0.1% - 300 people request access to a flight 25 x 300 = 7500 matches 7500 x .001 = 7 false positives
Implication of error ratesBiometric Screening “Why does it point to me?” • The no. of false positives , FPR(n) ≈ n x FPR(1) • Matching a positive data set M of m subjects requires m matches against a database N of n terrorists • m = 300 • n = 25 • # false positives for plane = m x FPR(n) = m x n x FPR(1)
Face , Finger and Voice • Technology evaluations • FARs are operating around 10%
Iris • “normal office environment”, with 200 volunteers over a period of 3 months • In identification mode, not in verification mode • High FRR may be due to environmental error, reflection from glasses, user difficulty
Hand Geometry • Group of 50 users. • 200 volunteers over a 3 month period
Signature • Does not have the characteristic of permanence • Accept = genuine, reject = forgery • Zero-effort forgery, Home-improved forgery, Over-the-shoulder forgery, Professional forgery
Signature (cont….) • Improvement of two-try over one-try indicates poor habituation of the biometric on that particular device
Summary of verification accuracies • Best error rates found in literature • One main thing is the volume
Identification system testing:Biometric data and ROC,CMC • Biometric capabilities like ranking and matching need to be developed by modeling biometric data and training using biometric data • Two different biometric statistics – ROC and CMC • ROC – measures the capabilities of a match engine s(B’,B) with some fixed t0 or as a function of some operating threshold T • CMC – measures the capabilities of a rank engine R((B1,B2),B’l) with ordered entries (B1,B2) € M and some unknown sample B’l
Biometric search engines • A hybrid approach - ranking followed scoring • Input to the 1:m search engine - B’l , the biometric sample • Output - vector CK(B’l) =(ID(1),…ID(K))T • The 1:m search engine with an enrollment database of M is defined as : CK = (B(1),B(2),….,B(K))T = (ID(1),ID(2),…,ID(K))T
Biometric search engines (cont…) • A possible architecture: - A biometric rank engine which determine some reordering Cm of vector M by repeatedly applying ranking - A biometric match engine determine using a scoring function s(B’l,B(k)) and decision threshold t0(B’l),a short candidate vector CK of the K top candidates
1:m search engine testing • The big distinction of a 1:m search engine compared to a 1:1 matcher - prerequisite of an enrollment database M = (B1,B2,….BM)T • We select the first m samples as database samples [9] • For other samples, denoted as {B’l,l=m+1}= D\M, a rank ř(B’l) is estimated as follows: 1.Computes the sets of scores Xl = {s(B’l,Bi); i = 1,….,m} for l = m + 1
1:m search engine testing (cont..) 2. Sort these scores: X~l = (s(B’l,B(1)),s(B’l,B(2)),….s(B’l,B(m)))T such that s(B’l,B(k)) > s(B’l,B(k+1)), 1 ≤ k < m 3. If (B’l,B(k)) is the mated pair, i.e., if Bi = B(k) matches B’l, ř(B’l) = k
Face Recognition and Verification Test 2000 • First attempt to characterize performance measures • 5 participating vendors had to compute an all-against-all match of a database of 13,872 face images • Some results: • Compression does not affect performance adversely • Pose changes up to 25 degrees was handled by algorithms, beyond 40 the performance degrades sharply • Images taken 12 or more months apart are difficult to recognize • Distance between camera and person matters a lot • Identification is more sensitive to expression changes than verification is
FRVT 2002 • An increase in database size • Difference in results in plain verification tasks – • K =sorted list size, m =gallery size
Thank you ! Especially to: Dr.Bebis for suggesting the additional paper Reza and Chang for help with the scanner