210 likes | 292 Views
Binning and Indexing Biometric Records. Sharat S. Chikkerur CUBS, University at Buffalo ssc5@eng.buffalo.edu. Problem Description. Biometrics are being deployed for immigration and national ID applications US-VISIT program Voter ID and national ID programs[3]
E N D
Binning and Indexing Biometric Records Sharat S. Chikkerur CUBS, University at Buffalo ssc5@eng.buffalo.edu
Problem Description • Biometrics are being deployed for immigration and national ID applications • US-VISIT program • Voter ID and national ID programs[3] • Potential size that can run into millions • Largest study by NIST considers only 620,000 records[4] • Apart from accuracy speed and efficiency also become important at this scale • Only biometric identification (1:N matching) can prevent duplicate enrollments
Problem Description (cont.) • In biometric templates, there is no natural order by which one can sort the biometric records • Biometric Templates are inherently higher dimensional • Semantic features are not stored in the template
Identification Problem • Let FAR and FRR be the false acceptance rate and false reject rate for 1:1 matching • For a 1:N matching, • The total number of false accepts is given by • Even if FAR = 0.0001%, False accepts = 1 in 10 for N=100000(lower bound) • No single biometric is capable of meeting this security requirement individually
Uses of Indexing and Binning • Ways to reduce identification errors: • Reduce N • Reduce FAR (Limited by technology) • We can reduce N by pruning the records • Let PSYS – Penetration rate • For a 1:N matching, • The total number of false accepts is given by • State of the art fingerprint systems has PSYS=0.5 [6]
Indexing and Binning(cont.) • Will allow us to screen immigrants at airports against a ‘watch list’ • Will make biometric systems more user-friendly by eliminating the need to remember PINs and Ids • Will improve accuracy (FARN) and performance
Binning Biometric Data Vector Quantization Approach
Vector Quantization(cont.) • In general a biometric template may be represented as a vector • The objective is to classify the vectors into N distinct classes(code book vectors) • The code book vectors divide the feature space into N distinct Voronoi regions • Properties of the regions:
Experimental Evaluation • 25x10 hand geometry features used • Each print represented by a 21D vector • Data divided equally among training and testing • Data is normalized using • VQ is implemented using k-means clustering • The codebook vectors are used on the test set
Normalization • Observations • Data normalization leads to spreading of data • Without norm., clusters converge to a single center • Equivalent to measuring Mahalanobis distance[5] • Difference instances of the same had misclassified
Indexing Biometric Data Spatial Access Methods Approach
Introduction to Spatial databases • Relational databases organize and store scalar data • Has planar organization • Contains scalar data (excluding LOBs, binary) • Data can be ordered linearly • Structured Query Language used to retrieve records • Spatial databases • Contain multi-dimensional or vectorial data • Relative positions may be explicit or inferred • Linear proximity does not imply spatial proximity • Multi dimensional data is used in computer vision, medical imaging, and BIOMETRICS • Original Applications • Point sets • CAD • VLSI drawings • Cartography, astronomy
Spatial databases (cont.) • Difference from pattern classification – QUERIES • Spatial searches • Neighborhood searches • PAM/SAM • Point Access Methods • Used on point databases • Points may be multi-dimensional (Vectors) • Points have spatial extents, intersection undefined • Each point is specified uniquely by its d co-ordinates • Spatial Access Methods • Used on lines, polygons, solids • Have spatial extent, intersection of objects well defined • A point may be occupied by more than one object
Problems with vectorial/spatial data • No standard algebra defined on spatial data • Union, intersection, union not defined exactly • Data operations highly application specific • Operators are not closed • Queries • Need support for spatial queries – point and region queries • No standard spatial query language • No natural ordering • Ordering that preserves spatial proximity does not exist • No mapping between multi-dimensional space to 1D such that two points that are close together in higher dimensional space are also closed linearly[1] • Is it possible to do this via PCA/KLT? • Cannot extend single key structures like B-Tree
Requirements of a spatial database • Dynamic updates • The structure should be consistent as data is inserted and deleted • Changes should be tracked • Independence of input data and insertion sequence • Should handle skewed data • Structure should be independent of insertion sequence(Compare tree) • Scalable • Efficiency • Time Efficiency • Efficient design will approach the performance of B-Trees • Space Efficiency • Indexing overhead should be small
Types of structures • K-d Trees • Binary tree in d-dimensional space • d-1 hyperspaces separate the subspaces • The directions alternate among the d-possibilities • Insertion and search are straight forward • Deletion is cumbersome • Structure is sensitive to insertion order
References • Gaede and Gunther, “Multidimensional Access Methods”, ACM Computing Surveys, Vol.30, No.2, 1998 • www.geocities.com/mohamedqasem/ vectorquantization/vq.html • Bolle et al. Guide to Biometrics, Springer Verlag, 2003 • NIST report to the United States Congress, “Summary of NIST Standards for Biometric Accuracy, Tamper Resistance and Interoperability”, http://www.itl.nist.gov/iad/894.03/NISTAPP_Nov02.pdf • http://www.galactic.com/Algorithms/discrim_mahaldist.htm • Dr.Wayman’s report, NIST
Thank You ssc5@cedar.buffalo.edu