380 likes | 674 Views
Keystroke Biometric Authentication System. Spring 2009. Team Members. Spring '09 Team Members Alpha Amatya James Aliperti Thomas Mariutto Ankoor Shah Michael Warren. Spring 2009 Focus. Web Based Authentication continued development of the test-taker authentication application
E N D
Keystroke Biometric Authentication System Spring 2009
Team Members Spring '09 Team Members • Alpha Amatya • James Aliperti • Thomas Mariutto • Ankoor Shah • Michael Warren
Spring 2009 Focus • Web Based Authentication continued development of the test-taker authentication application • Development of New Concepts weighted & unweighted top n choices strong vs. weak enrollment • Modify Existing & Write New Programs to simulate various scoring procedures • Run Experiments to produce various scenarios
Reasons for Study • Keystroke Biometrics is one of the least studied Biometrics Applications used for user authentication • Most studies use short input; passwords & user names • This study focuses on long text input – Free/Copy • Typing characteristics are said to be: 1) Unique to an individual 2) Difficult to duplicate • Very important for online test taking systems • Important for overall Computer System Security • No special equipment is needed
Contents of System A PHP Website registers the user. A modified Java applet captures 300 keystrokes and produces two files: a raw data file and a text file. A Java program, BioFeature++, extracts 230 feature measurements. A Java program, Biometric Authentication System (BAS), performs authentication tests.
4 Quadrant Data Collection 36 Subjects 4 Quadrants 5 samples per quadrant Types of Data Collected • Copy Text • Free Text Entry Modes • Desktop • Laptop
Deliverables • Recreate authentication experiment from Keystroke Book Chapter • Rewrite user and technical manuals • Modify classifier program to produce top n Within/Between choice and distances • Create 1st, 3rd and 5th Nearest Neighbor output tables • Create output file of top 3 choices from Classifier program and obtain FRR, FAR and Performance • Create ROC curves for each of the 4 quadrant data samples • Run two small-training, strong-enrollment authentication experiments • Run big-training, strong-enrollment authentication experiments, incrementally increase training sizes • Write detailed descriptions of data formats • Investigate discrepancy between 230 and 239 Linguistic-model features
Hierarchical Fallback Models • Touch-type Model- - based on keys struck by touch typists 254 distinctive measurements • Linguistic Model- -based on language and most frequently used keys 230 distinctive measurements • Increased performance results found utilizing the Linguistic Model
Top n=10 W/B Choices And Distance • The implementation compared each sample from the dichotomized test data with every sample from the dichotomized train data. • The shortest Euclidean distance was taken for n=10 choices . • This distance and the choice class , Within (W) or Between (B) was recorded. • This program was run for all four quadrants. • Each output contained 180 Within + 3825 Between = 4005 choice tables.
Overall Accuracy For n=10 Output File using 1st,3rd,& 5st Nearest Neighbors • Implemented a program to check overall accuracy on the outputs created in Deliverable 3. • Calculated FRR, FAR and performance for all the experiments in the 4 quadrants. • Precisely matched Deliverable 1 outputs using 1-Nearest Neighbor, thus proving our experiments are carried out precisely and accurately. • Resulted in a slight improvement using 3 & 5 nearest neighbors as expected.
1st, 3rd and 5th Nearest Neighbor Within and Between Choices derived using Euclidean distance
Receiver Operating Characteristics (ROC curve) Graphical representation of FAR and FRR FAR- False Acceptance Rate • authenticating an imposter FRR- False Rejection Rate • rejecting a valid user Top n Nearest Neighbor Responses Unweighted each output choice counted equally Weighted first output choice (more valuable) is scored higher
Receiver Operating Characteristics (ROC curve) Implementation: Weighted • Taking n=10 W/B choice output file as input, authenticated a user if 1 or more of the 10 choices is Within(W). • Each match was scored using the formula score +=(10-j+1) where score: 0 ->55 , j: 1->10& choice =W • Maximum score = 55 • Minimum score = 0 • FRR, FAR for i=0 -> 55 was calculated and ROC plotted.
Receiver Operating Characteristics (ROC curve) Implementation: Unweighted • Taking n=10 W/B choice output file as input, authenticated a user if 1 or more of the 10 choices is Within(W). • Each match was scored using the formula score +=1 where score: 0 ->10 , j: 1->10& choice =W • Maximum score = 10 • Minimum score = 0 • FRR, FAR for i=0 -> 10 was calculated and ROC plotted.
2 big training, strong enrollment authentication experiments Train on 36 subjects and test on 18 subjects
Feature measurements Duration - Calculates the average response time and the standard deviation Transition – Divided into two types Type I - short transition is the time between release and next press Type II – long transition is the time between press and the next press Percentage - Expressed as a ratio of total number of occurrences over total number of KeyStrokes Discrepancy between 239 and 230 feature measurements Additon of 6 other least frequent consonants feature group include (q,v,j,x,z,k) Removal of 15 long transition (type 2) feature group include (th, st, nd, an, in,er,es,on,at,en, or, he, re, ti, ea) Linguistic Features Identify Discrepancies
Small training, strong enrollment authentication experiments
Big training, strong enrollment authentication experiments. 5000, 10000 and 20000 inter-class samples
Documentation Creation • User Manual • Technical Manual http://utopia.csis.pace.edu/cs691/2008-2009/team4/
Future Work • Experiment to determine why Laptop input is less consistent in comparison to Desktop input • Possible reasons • Different keyboard layouts • Different body positioning during typing • Desktop are more fixed and consistent • Data should be stored in a database as opposed individual files • Older code should be re-factored in order to run more efficiently • Combine last and this semesters work into one project
Conclusion • Thank you for your time and attention