Methods of Deriving Biometric ROC Curves from the k -NN Classifier

Methods of Deriving Biometric ROC Curves from the k-NN Classifier Robert S. Zack May 8, 2010

Agenda • Introduction to ROC Curves • Classification • Multi-Class Issues and Solutions • New Derivation Methods • Weak and Strong System Training • Use Cases • Search for a Topic • Publications • Dissertation Status • Questions

Introduction to ROC Curves • Used for binary decisions • Signal detection – signal / no signal • Medical diagnosis – disease / no disease • Biometric authentication – you are the person you claim to be / you are not • In biometrics the ROC curve varies from FAR=1 & FRR=0 at one end to FAR=0 & FRR=1 at other • FAR = False Accept Rate – the rate an imposter is falsely accepted • FRR = False Reject Rate – the rate the correct person is falsely rejected • ROC Charts are expressed in terms of percentages (0-100%) or probabilities (0-1). These are used interchangeably.

Authentication Analogy • Supreme Court – nine judges • Usual procedure – majority required to make decision • Like 9NN needing majority to authenticate a user • ROC Curve – creates many potential procedures • Need 9 votes to make decision (very conservative) • Need 8, 7, 6 votes to make decision (conservative) • Need 5 votes to make decision (majority) • Need 4, 3, 2 votes to make decision (liberal) • Need 1 or even 0 votes to make decision (very liberal)

Anatomy of a Biometric ROC Curve • Conservative is too restrictive. • Positive classification requires strong evidence. • Liberal is too open. • Requires weak evidence.

Parametric Procedures • Parametric techniques are well studied. • Data follows a normal or Gaussian distribution. • Vary a threshold to obtain the tradeoff between FAR/FRR. • Probability density functions can be calculated without estimation.

Parametric ROC Derivation

Classification • The k-NN classifier is well studied. • Biometrics classification problems can have many classes. • It is easier to work with a large or unknown population if the data is converted from a multi-class to a two-class decision. • Cha Dichotomy Model.

K-NN Nonparametric Classifier • k-NN is nonparametric. • A vector-difference model is used to covert a many class problem into a two class, binary problem. • Uses Euclidean distance k-NN Classification Procedure for k=5, Adapted from Pattern Classification, Duda, et al.

Cha Dichotomy Model • Simplifies complexity • Transforms a feature space into a distance vector space. • Uses distance measures. Multi-class to two Class Transformation Process, Adapted from Yoon et al (2005)

m-kNN Method • Pure Rank Method. • Evaluate the top 7 NN. • Q is authenticated if # within-class matches is >= decision threshold of 4NN. • Unweighted. All W’s are equal in weight.

wm-kNN Method • Rank method weighted by rank order. • Authenticate if W choices are > weighted match (m) • Score varies from 0 to=k(k+1)/2 or5+4+3+2+1 • For every m, FAR/FRR pair or ROC point. • If m=0, FAR=1, FAR=0 …All users accepted. • If m=15, FAR=small, FRR=large, few Q’s accepted.

m-kNN and wm-kNN ROC’s LapFree – Weak Training

m-kNN and wm-kNN ROC’s DeskFree – Weak Training

t-kNN Method • A distance threshold method. • A positive vote is within a distance threshold from the user’s sample. • Uses feature vector space distances only. • At 0, no distance vectors are authenticated. FAR=0, FRR=100%. At t=100, all distance vectors are authenticated. FAR=100, FRR=0.

t-kNN Method DeskFree (left) and LapFree (right) Data

ht-kNN Method • Weighted vote based on distances to the kNN. • Hybrid of rank method and vector space distances. • For each test sample, the within-class weight (WCW) is calculated based on the distance vectors. DeskFree (left) and LapFree (right) Data

New Nonparametric ROC Methods • Need m votes out of k for decision • Pure rank method • Need wm votes for decision, but some judges get more than one vote (weighted method) • Rank method weighted by rank order • A positive vote is within a distance threshold from the user’s sample • Uses feature vector space distances only • Weighted vote based on distances to the kNN • Hybrid of rank method and vector space distances

Weak & Strong Training • Weak Training • People used in testing not used in training • Independent sets of users for testing and training • Strong Training • People used in testing also used in training • Usually to augment the different training people • But new difference-vectors used to authenticate • For example, users provide 8 samples – 5 for training and 3 to match against for authentication

Weak & Strong Training

Use Cases • On-line test taking – Authentication Application • Enroll students at the start of a class. Collect biometric samples. • Authenticate users are who they should be using off-line batch processing. • Corporate Compliance Training/Test Administration • Enroll employees at some point prior to the training or test administration. Collect biometric samples. Refresh them at designated intervals. • Authenticate users are who they should be.

Future Work • Real-time authentication. • Accuracy Improvements. • Error Cost Analysis. • Measurement Error.

Initial Search for a Topic • Started program in Fall 2008. • Entered DPS with an idea to research a topic in the area of mobile computing. Quickly discarded the idea. • Continued to search for ideas by participating as a Customer for IT691/CS691Projects. Became exposed to Facial and Keystroke Biometrics. • Continued working with Keystroke Biometrics and eventually found a topic with the help of Dr. Tappert.

Idea Vetting • The first few presentations of the topic met with a lot of resistance. It took some time to develop the “so what”. • Every Research Seminar was recorded so that I could go back and listen to criticisms. • Participated as co-author to several papers on the subject. Some papers were peer-reviewed and submitted for publication.

Publications • [1] J. Abbazio, S. Perez, D. Silva, R. Tesoriero, F. Penna, and R. S. Zack, "Face Biometric Systems," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2009, pp. C1.1-C1.8. • [2] A. Amatya, J. Aliperti, T. Mariutto, A. Shah, M. Warren, R. S. Zack, and C. C. Tappert, "Keystroke Biometric Authentication System Experimentation," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2009, pp. C4.1-C4.8. • [3] A. C. Caicedo, K. Chan, D. A. Germosen, S. Indukuri, M. N. Malik, D. Tulasi, M. C. Wagner, R. S. Zack, and C. C. Tappert, "Keystroke Biometric: Data/Feature Experiments," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010. • [4] K. Doller, S. Chebiyam, S. Ranjan, E. Little-Tores, and R. S. Zack, "Keystroke Biometric System Test Taker Setup and Data Collection," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010. • [5] S. Janapala, S. Roy, J. John, L. Columbu, J. Carrozza, R. S. Zack, and C. C. Tappert, "Refactoring a Keystroke Biometric System," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010, pp. B1.1-B1.8. • [6] M. Lam, U. Patel, M. Schepp, T. Taylor, and R. S. Zack, "Keystroke Biometric: Data Capture Resolution Accuracy," in Student-Faculty Research Day, CSIS, Pace University, White Plains, 2010. • [7] C. C. Tappert, S.-H. Cha, M. Villani, and R. S. Zack, "A Keystroke Biometric System for Long-Text Input," International Journal of Information Security and Privacy, Pending Publication, 2010. • [8] R. S. Zack, C. C. Tappert, S.-H. Cha, J. Aliperti, A. Amatya, T. Mariutto, A. Shah, and M. Warren, "Obtaining Biometric ROC Curves from a Non-Parametric Classifier in a Long-Text-Input Keystroke Authentication Study," vol. 268, Pace University, 2009.

Questions

Methods of Deriving Biometric ROC Curves from the k -NN Classifier

Methods of Deriving Biometric ROC Curves from the k -NN Classifier

Presentation Transcript

ROC Curves

CAP and ROC curves

Better Classifier Comparison through the ROC Convex Hull Method

ROC curves

Keystroke Biometric : ROC Experiments

Review of Assignment 2 (k-nearest neighbor classifier)

Deriving Energy From Food

k NN , K- Means, Clustering and Bayesian Inference

Feature transformation through rule induction: A case study with the k -NN classifier

Deriving the rule

Preparation for the K +  p + nn Analysis

The future of NA48: K + → p + nn

Neyman-Pearson Tests and ROC curves

CAP and ROC curves

Non-Parametric: K-NN

Multisample Classification in Clinical Decisions using Multi Aggregative Factored K NN Classifier

Precision, Recall and ROC curves

ROC Curves

ROC curves