310 likes | 334 Views
Chapter 15 System Errors Revisited. Ali Erol 10/19/2005. System Errors Revisited. Quantify the accuracy of FAR and FRR estimates . Confidence Intervals, a well known technique used in statistical analysis. See references [22],[23].
E N D
Chapter 15System Errors Revisited Ali Erol 10/19/2005
System Errors Revisited • Quantify the accuracy of FAR and FRR estimates. • Confidence Intervals, a well known technique used in statistical analysis. • See references [22],[23]. • The first three author’s algorithm [23] experimentally demonstrated to provide better Confidence Intervals estimates.
FAR/FRR • Definition: FRR(x)=Prob(smx/H0)=F(x) FAR(y)=Prob(sn>y/Ha)=1-Prob(sn y/Ha)=1-G(y) • We need • F(x)=Dist(x) : Genuine (Matching) score DF • G(y)= Dist(y): Imposter (Non-matching) score DF
FAR/FRR • Instead we have • Set of genuine scores X={X1, X2, …., XM} • Set of imposter scores Y={Y1,Y2, …., YN} • We estimate
Problem • What is the accuracy of these error rates? • The number of biometric samples • The quality of the samples • Data collection procedure (e.g. 10 consecutive samples) • Subjects involved, the acquisition device etc.
An Estimation Problem Given x: A random variable (F(x) denotes Dist(x)) X={X1, X2, …., XM}: Sample set Estimate =E(x) Solution Error (Unbiased estimator*)
Biased/Unbiased Estimators • For an unbiased estimator we have • Example: Gaussian Model: Estimate mean 1and variance 2 using maximum likelihood criterion i.e. maximize Prob(X/ ,) (Unbiased estimator) (Biased estimator) (Unbiased estimator)
Confidence Interval • Assume F(x) is given then Dist(r) can be calculated • r is function of , which is a function of x • Calculate (1-) 100% certainty (Next Slide) r[1(,X), 2(,X)] • Which leads to (1-)100% confidence interval for given by
Confidence Interval • Example • Discard /2 on lower and higher ends • Find the r values corresponding to the interval boundary (called quantile) Dist(r) r Prob(q(/2) r q(1-/2))=1-
Confidence Interval • Interpretation: • Generate sample sets X from F(x) • Calculate confidence intervals for each X • (1-)100% of these intervals contain .
Parametric Method • Xi identically distributed • Assume Xiare independent(not true in general) • Then can be taken to be normal distribution using central limit theorem (large M). • Result: • E.g. For 95% confidence z=1.96 • Smaller interval with increasing M and
Non-Parametric Method Sample Set X • Assume F(x) is available. f(x) Density of Additional Sample Sets Random Variable
Non-Parametric Method • FACT: For large B we have • Define error to be • Calculate Dist(r) • Solution:
Dist(r) r /2 /2 Non-Parametric Method • Interval calculation: Sorting and counting
Bootstrap Method • F(x) is not available; all we have is X • How do we generate ? • Solution (i.e. Bootstrap method): Sampling with replacement from X. • Put the samples in a bag, draw, record and put it back. • Draw M samples from X B times. Some samples Xi may not be in each set.
/2 /2 Bootstrap Method (Imperfections) • Xi are not independent. • In SR the dependence between samples is not replicated. • Effect of dependence for independent samples • Variance of is smaller • Leads to smaller CIs
Subset Bootstrap • Potential sources of dependency • All samples from the same person (e.g. multiple fingers) • All samples from same biometric (e.g. finger) • Partition X into independent subsets • Apply SR on subsets.
Subset Bootstrap (An example) • Fingerprint database • P persons • c fingers per person D=cP Fingers • d samples per finger • DB Size= cPd • Matching pairs • d(d-1) per finger • cd(d-1) per person • cPd(d-1)=Dd(d-1) total • Using a symmetric and asymmetric matcher does not make any difference [23].
Subset Bootstrap (An Example) • X1 X2 • X1: P=10 c=2, D=20, d=8 M=1120 • X2: P=50 c=2, D=100, d=8 M=5600 • Finger based partition: Set subsets to be the samples from the same finger (i.e. D subsets of d(d-1) matching scores) • Person based partition: Set subsets to be the samples from the same person (i.e. P subsets of cd(d-1) matching scores)
Subset Bootstrap (An Example) • We expect • CI1 (light gray) to be larger than CI2 (dark gray) • Because X1 has smaller number of samples • CI2 (dark gray) to be contained in CI1 (light gray) • Because X1 X2 • The intervals are larger for person based partitioning • There is dependency between fingers of the same person
CIs for FAR/FRR • Calculate CIs for each threshold T=t0 and given an
CI for FRR • Given genuine score set X • Generate • Calculate • Sort and count
CI for FAR • Given imposter score set Y • Generate • Calculate • Sort and count
Subset Bootstrap for FAR • Imposter scores Y are not independent • We are using multiple impressions of the same finger. • Let Ixk: kth finger impression from subject x then sim(Ia1,Ib1), sim(Ia1,Ib2), sim(Ia2,Ib3) are not statistically independent • Use a finger only once; for D fingers we have only D/2 such pairs • There is actually dependency between X and Y
Subset Bootstrap for FAR • Fingerprint database • P persons • c fingers per person D=cP Fingers • d samples per finger • DB Size= cPd • Non-matching pairs • N=d2D(D-1)=P[(dc)2(P-1)+d2c(c-1)] • d2(D-1) per finger • (dc)2(P-1)+d2c(c-1) per person
Subset Bootstrap for FAR …. …. DB Partition IN Ii I1 x Y1=IixI1 YN-1=IixIN Ii • Finger (N=D): Take Ii (d elements), match itagainst Iki(d2 pairs) then we have d2(D-1) pairs. Repeat it with all Ii to construct subsets Yk • Person (N=P): Take Ii (cd elements), match itagainst Iki((dc)2 pairs) then we have (dc)2(P-1) pairs. Inside Ii we have d2c(c-1) pairs. Repeat it with all Ii to construct subsets Yk • Not completely independent: We use Ii many times.
Subset Bootstrap for FRR • Person subset is a better estimate
How good are the CIs? • There exists a true confidence interval (At the beginning we assumed F(x) is known) • The CI we calculate is just one estimate. • How accurate is that estimate?
How good are the CIs? • We estimate E(x) • Ideal Test: Assume F(x) is available • Generate • Calculate • Assume and test if
How good are the CIs? • Practical Test (for comparison) • Randomly split X into two subsets Xa and Xb • Calculate and CIa • Test • Repeat 1-3 many times and count the number of hits i.e. probability of falling into the CIa • Hit rate is not equal to the confidence. Assume have normal distribution. • The higher the hit rate is the better the estimates are.
How good are the CIs? • =0.1 • Person based partitioning provide more accurate confidence intervals • 73.10% is very close to the expected value