330 likes | 422 Views
Biometric Authentication Revisited: Understanding the Impact of Wolves in Sheep Clothing. Lucas Ballard, Fabian Monrose, Daniel Lopresti USENIX Security Symposium, 2006 Presenter: Tao Li. Motivation .
E N D
Biometric Authentication Revisited: Understanding the Impact of Wolves in Sheep Clothing Lucas Ballard, Fabian Monrose, Daniel Lopresti USENIX Security Symposium, 2006 Presenter: Tao Li
Motivation • To argue that previous assumption that forgers are minimally motivated and attacks can only be mounted by hand is too optimistic and even dangerous • To show that the standard approach of evaluation significantly overestimates the security of the handwriting-based key-generation system
What did the authors do? • In this paper, the author described their initial steps toward developing evaluation methodologies for behavior biometrics that take into account threat models which have largely been ignored. • Presented a generative attack model based on concatenative synthesis that can provide a rapid indication of the security afforded by the system.
Outline • Background Information • Experimental Design • Human Evaluation • Generative Evaluation • Conclusion
Background Information • Obtaining human input as a system security measure • Not reproducible by attackers • Eg, passwords • Online attacks—limited to a number of wrong attemps • Offline attacks—limited only to the resources of the attackers, time & memory. • When use passwords to derive cryptographic keys, susceptible to offline attacks
What is biometric? • An alternative form of user input intended difficultly to be reproduced by attackers • A technique for user to authenticate himself to a reference monitor based on biometric characteristics • A means for generating user-specific cryptographic keys. Can it survive offline attacks?—Not sure • Password hardening: password + biometric
Which is good biometric features? • Traditional procedure of biometric as an authenticate paradigm • Sampling an input from user • Extracting an proper set of features • Compare with previously stored templates • Confirm or deny the claimed identity • Good features exhibit • Large inter-class variability • Small intra-class variability
How to evaluate biometric systems? • The standard model • Enroll some users by collecting training samples, eg, handwriting or speech • Test the rate at which users’ attempts to recreate the biometric within a predetermined tolerance fails--False Reject Rate (FRR). • False Accept Rate (FAR): rate to fool the system • Equal Error Rate (EER): where FRR=FAR • The lower EER, the higher the accuracy.
How to evaluate biometric systems? • Commonly divided into naïve forgeries & skilled forgeries • Missing generative models to create synthetic forgeries • Evaluation is misleading under such weak security assumptions which underestimates FAR.
Handwriting Biometrics • As a first step to provide a strong methodology for evaluate performance, the authors developed a prototype toolkit using handwriting dynamics as a case in point.
Handwriting Biometrics • Offline handwriting • A 2-D bitmap, eg, a scan of a paper • only spatial info. • Features extracted from it like bounding boxes and aspect ratios, stroke densities in a particular region, curvature measurements. • Online handwriting • Sampling the position of a stylus tip over time on digitizing tablet or pen computer • temporal and spatial info. • Features includes all from offline and timing and stroke order information
Experimental Design • Collect data over 2 months analyzing 6 different forgery styles • Three standard evaluation metrics • Naïve—not really forgeries, naturally forgeries • Static—created after seeing static rendering of the target user’s passphrase • Dynamic—using real-time rendering • Three more realistic metrics • Naïve*--similar to naïve, except similar writing style attacker • Trained—forgeries after attackers are trained • Generative—exploit info to algorithmically generate forgery
Data Collection • 11,038 handwriting samples collected on digitized pen tablet computers from 50 users during 3 rounds
Data Collection • Round one: 1 hour, two data sets • First set established a baseline of “typical” user writing • 5 different phrases—2 words oxymoron, ten times each • Establish biometric templates for authentication • Samples for naïve and naïve* forgeries • Second data set, the “generative corpus” • To create the generative forgeries • Consists of a set of 65 oxymoron
Data Collection • Round 2, 90 min, 2 weeks later • Same users wrote the 5 phases of round 1 ten times, forge representative samples of round 1 to create 2 sets of 17 forgeries • Static forgeries—seeing only static representation • Dynamic forgeries—seeing a real-time rendering of the phrase
Data Collection • Round 3, select nine users and train them • Exhibit a natural tendency of better forgery • 3 skilled but untrained users each writing style: cursive, mixed, block • Train them: forge 15 samples from their own writing styles with real-time reproduction of the target sample.
Authentication System • User’s writing sample on the electronic tablet represented by 3 signals over time • x(t), y(t) for location of the pen • p(t) for pen up or down at time t • Tablet computes a set of n statistical features (f1,f2,…..fn) over the signals
Authentication System • Based on the variation of feature values in a passphrase written m times and human natural variations, generate a n*2 matrix template {{l1,h1},…..{ln,hn}}. • Compare the user sample with feature values f1,f2,…,fn with it. Each fj<lj or fj>hj results in an error.
Feature analysis • Not only the entropy of each feature, but rather how difficult the feature is to forge • For each feature f • Rf: proportion of times that f was missed by legitimate users • Af: proportion of times that f was missed by forgers from round 2 • Q(f)=(Af-Rf+1)/2 • Q(f) more closer to 1, the feature more desirable
Feature analysis • Divide feature set into temporal and spatial groups and order them based on Q(f), chose top 40 from each group and discard any with a FRR greater than 10%, finally got 15 spatial and 21 temporal features.
Human Evaluation • At seven errors, the trained mixed, block and cursive forgers improved their FAR by 0.47, 0.34 and 0.18. • This improvements results from less than 2 hours’ training
Generative Evaluation • Fining and training skilled forgers is time consuming • To explore the use of an automated approach using generative models as a supplementary techniques for evaluating behavioral biometrics. • To investigate whether an automated approach, using a limited writing samples from the target, could match the false accept rates observed for the trained forgers
Generative Evaluation • The approach to synthesize handwriting is to assemble a collection of basic units (n-grams) that can be combined in a concatenative fashion to mimic authentic handwriting. • The basic units are obtained from • General population statistics • Statistics specific to a demographic of the targeted user • Data gathered from the targeted user
Generative Evaluation • Generative signature using some basic units from the database as above • Original signature shown below
Generative Evaluation • Limit 15 out of the 65 samples of target user and 15 samples of same style users • Result: generative attempt only used 6.67 target users’ writing samples and the average length of an n-gram was 1.64 characters
Conclusion • The authors argued in detail that current evaluation of security of biometric system is not accurate, underestimating the threat • To prove this, they analyzed a handwriting-based key-generation system and show that the standard approach of evaluation significantly overestimates its security
Conclusion • Present a generative attack model based on concatenative synthesis that automatically produce generative forgeries • The generative approach matches or exceeds the effectiveness of forgeries rendered by trained humans
Weakness & Where to improve • The handwriting-based key-generation system needs lots of people and work. • It remains unclear as to the extent to which these forgeries would fool human judges, especially forensic examiners • The generative algorithm needs improvement like incorporating other parameters in it to make it more accurate.
Thanks! Any Questions?