Instructors: Dr. George Bebis and Dr. Ali Erol. Presented by Milind Zirpe. Fall 2005

Chapter 14A Biometric’s Individuality Instructors: Dr. George Bebis and Dr. Ali Erol. Presented by Milind Zirpe. Fall 2005

Overview • Approaches to individuality • Empirical individuality studies • A partial iris model • Fingerprint individuality • Standards for legal evidence • Expert fingerprint testimony

What does biometric individuality mean? • Given a biometric sample, determine the probability of finding an arbitrary biometric sample from the target population sufficiently similar to it. (Pankanti et al. [158]) • In other words, what are the theoretical lower bounds on the FAR and FRR, often called the “intrinsic error rates”.

Approaches to individuality In order to have a meaningful formulation of the individuality problem, we need to define: • Representation of the biometric identifier • A metric of similarity of two biometric samples • Representations of the target population (or their representative samples)

Approaches to individuality E.g.: Fingerprints

Approaches to individuality • Theoretical approach: Modeling of all realistic phenomena affecting between-class and within-class pattern variations to theoretically estimate the probability of a false match. • Empirical approach: Samples of biometric identifier are collected and using a typical matcher empirical accuracy estimates are obtained.

Approaches to individuality Adv. & Limitations of both approaches: • Theoretical: -Realistic modeling of all parameters of biometric matcher is very hard. • Empirical: -Problems with collection of representative samples and possibly large number of required samples. -Fundamental understanding of underlying system design issues difficult to comprehend.

Empirical individuality studies A) Iris: (Daugman [47]) • Degrees of freedom of iris mismatch score distribution as a measure of individuality. • Hamming distance used as metric for comparing iriscodes. • Daugman’s analysis estimates that probability of finding two sufficiently similar iriscodes is 2-173≈ 10-52.

Empirical individuality studies B) Handwriting: (Shrihari et al. [207]) Digitization Feature Extraction (document, paragraph, word and character level) Automatic document understanding system 3 samples of natural English handwriting Individuality information estimation • Handwriting biometric, based on document level features, has a FAR of about 5%. • Probability of correctly identifying a writer from a database of 1000 writers was estimated to be on average 80%.

A partial iris model • Introduction • FAR modeling • FRR calculation • Numeric evaluation

Introduction • Hypothesis are: Ho: Q≡ R, Q and R are from the same iris; Ha: Q≠ R, Q and R are from different irises. • Q & R are real world irises. • Iris is photographed, processed and represented as a reference N-bit iriscode R. i.e.. R=R(R). • Similarly, N-bit Q string is extracted from an unknown iris. i.e.. Q=Q(Q). • h is the Hamming distance between the two iriscodes Q & R, h belongs to {0,…,N}.

Introduction • dT is a decision threshold. When h(Q,R) < dT, Q and R are likely to be same. When h(Q,R) > dT, Q and R are likely to be different. • Using previous information we get expressions for FRR and FAR: FRR(dT) = Prob(h > dT | Ho) = Prob(h > dT | Q ≡ R) FAR(dT) = Prob(h ≤ dT | Ha) = Prob(h ≤ dT | Q ≠R)

FAR modeling • To develop an iris model and estimate FRR & FAR there are three requirements: • A mathematical representation of the biometric identifier. • A measure of similarity between the two biometric identifiers. • Some formulation for the probability distribution of iris population, i.e. the distribution of the unenrolled irises. We have iriscode and Hamming distance for the former two points. The last point is needed to express probabilities of FRR(dT) and FAR(dT).

FAR modeling

FAR modeling Probability of False Accept is: where Q and R are non-matching iriscodes. The Hamming distance is discrete and denoted as h(Q,R) = n ≥ 0, where n can be 0,…,N. Hence above expression can be rewritten as a summation:

FAR modeling • Assume that we have a fixed h(Q,R)=n. • We need to find out what are the chances of False Accept being generated when estimating from Q. • Assume that i bits flip among non-matching (n) bits. Similarly, j bits flip among matching (N-n) bits. • So, to get a False Accept: (n + j - i) ≤ dT. • The probability of i bits flipping in the non-matching n bits is: • The probability of j bits flipping in the matching n bits is:

FAR modeling • From expressions 14.3 & 14.4, we can write Pn(dT) as: • Now for G(n): • Let the probability that an individual bit agrees be g and that it does not be (1-g). So the probability that (N-n) bits agree and n bits disagree is: • Assuming g=1/2 and putting expressions 14.5 & 14.6 in 14.2, we get:

FRR calculation which is the probability that i bits flip while reading true iriscode Q to obtain . Assuming independence of flip and non-flip events, we have: where (1-p) is the probability of not flipping a bit.

Numeric evaluation

Fingerprint individuality • Introduction • A simple model • Probabilistic scoring • A more complex model • Model comparison • Imposing structure • Minutiae distributions

Introduction The two fundamental premises on which fingerprint identification is based are: • Fingerprint details are permanent. • Fingerprints of an individual are unique.

A simple model • To compute individuality, an estimate of likelihood of randomly generating a minutiae template that matches the template of an enrolled person is established. [181]

A simple model • Each minutia has matching probability: • Assuming that some constant value can be assigned to P, the chance of exactly matching t of the Q minutiae is: • Taking into consideration the number of ways for selecting t from Q and the fact that matches of m or more minutiae count as a verification, we get:

A simple model • For convenience, let N = Q = R and since P is fairly small, we can use Poisson approximation to the previous binomial probability density function to get: • Above summation is heavily dominated by its first term; neglecting all but the first term results in:

A simple model • Now because m is moderately large (>10 in practice), we use Stirling’s approximation & rearrange the equation to emphasize the exponential nature: The above equation gives an estimate of the FAR.

A simple model Observations: • If we have other local characteristics (e.g. minutiae count) that can be associated with the minutiae, the probability of a chance match is much lower since w is larger and P of Equation (14.9) is lower. • If resolution (K) is increased in Equation (14.9), then P is smaller. So the strength of a fingerprint representation (template) increases. • In Equation (14.11), the probability of mismatching two prints based on mismatching m minutiae increases with N (as N=Q=R). • When K is fixed, the number N of minutiae that can be reliably detected is bounded by N << K, and depends on the noise, thus bounding Expression (14.11). • For best security, N needs to be kept as low as possible and spurious minutiae from poor fingerprint images are detrimental.

Probabilistic scoring • How should one measure the degree of match between two prints? Construct a scoring function based on the relative “surprise” associated with matching a certain number of minutiae. Based on this we can rewrite equation 14.11 as: • In normal case, R & Q are different. So let us replace N with Q in above expression and use the approximation p = R / Kd. Also, since we are only ranking matches, the last constant term can be omitted.

Probabilistic scoring • For most cases the expression is heavily dominated by its first term. Expanding just this term, regrouping and multiplying through by yields the final result:

A more complex model • Given a query fingerprint containing Q minutiae, the goal again is to compute the probability that any arbitrary reference fingerprint with R minutiae matches by chance. • Fingerprint minutiae are defined by their location (xc,yc) and the angle θ of the ridge on which they reside. • A minutiae j in query fingerprint is considered to match minutiae i in reference template, if and only if: where r0 is the tolerance distance and θ0 is the tolerance angle, as shown in the following figure.

A more complex model

A more complex model • Consider only the probabilities of spatial matching of minutiae. The probability that there are exactly ρ minutiae matches based on only the locations of Q and R minutiae is given by: • Once the ρ minutiae positions are matched, probability that t (≤ ρ) minutiae among these also have matching directions is given by:

A more complex model • So, the probability of matching m minutiae in both location as well as direction is given by, which is a theoretical estimate of the FAR.

A more complex model Conclusions: • The probability of a chance match is lower if L is larger. The more features per minutia, the better. • This probability is lower if K, a measure of resolution, is increased. • Because (14.17) is a sum from ρ = m,…, min(R,Q), the probability of mismatching two prints increases with R and Q. • The probability of false association goes up fast when the noise in feature detection increases. • This model allows us to work with different numbers of Q and R, say, Q < R, which typically is the case when Q is a latent print.

Model comparison Comparing the simple and complex models: • Typical fingerprint numbers scanned at 500 dpi and of size 0.75” x 0.6” (200 pixels/cm and 1.91 cm x 1.52 cm) are: K = 450 and N = Q = R = 40. • If m = 25 minutiae are required to match for matching the two prints, then we get:- • For simple model, using (14.10): Thus there are roughly 85 bits of information content in this representation.

Model comparison • For the complex model, using (14.17): Thus there are approximately 62 bits of information content in this representation. • Although FAR for complex model is higher than simple model, when features are quantized, i.e. noise is introduced, it is important to model the accidental matching of false minutiae and not just the matching of correct minutiae. • Also, the probability, 2.5 x 10-19 represents a probability of 1:1019, which is very large for practical purposes.

Imposing structure • By imposing fingerprint structural constraints on the random point configuration space, it is possible to derive a more precise estimate of probability of false association. • w is the ridge width and 2w is the ridge plus valley width. • The distance r0 is the linear separation in minutiae location along a ridge.

Imposing structure Therefore the value should be replaced by

Minutiae distributions

Minutiae distributions Conclusion: • The assumptions about K and d in the simple and complex models are not exactly right. • Locations near the center of the print are more likely to contain minutiae. • Position of minutiae relative to the center of the print predisposes it to more likely be in one direction than another. • So in reality the probabilities of false association are somewhat higher and we are erring on the safe side.

Standards for legal evidence The Supreme court stated that when assessing reliability, following five factors should be considered: • Whether the technique or methodology in question has been subject to statistical hypothesis testing. • Whether its error rate has been established. • Whether the standards controlling the technique’s operations exist and have been maintained. • Whether it has been peer reviewed and published. • Whether it has a general widespread acceptance.

Expert fingerprint testimony • Matching a latent query print Q, found at some crime scene, to some reference print R.

Expert fingerprint testimony We have three phenomena that cooperate to increase the probability of false association: • The number Q of query minutiae in latent print Q is smaller than the number R of reference minutiae in reference print R. • The print Q is of lesser quality and the minutiae false detection or false miss rates are higher. • The Q minutiae of print Q have larger localization errors.

Expert fingerprint testimony

Expert fingerprint testimony The 12-point rule: A match consisting of at least 12 minutiae points is considered as sufficient evidence in many courts of law.

Expert fingerprint testimony • Effects of fingerprint expert misjudgments when using 12-point rule: • In general, erroneously pairing minutiae has significantly more impact than missing genuine minutiae in the query latent print.

Thank you.

Instructors: Dr. George Bebis and Dr. Ali Erol. Presented by Milind Zirpe. Fall 2005