Chapter 15 System Errors Revisited

Chapter 15System Errors Revisited Ali Erol 10/19/2005

System Errors Revisited • Quantify the accuracy of FAR and FRR estimates. • Confidence Intervals, a well known technique used in statistical analysis. • See references [22],[23]. • The first three author’s algorithm [23] experimentally demonstrated to provide better Confidence Intervals estimates.

FAR/FRR • Definition: FRR(x)=Prob(smx/H0)=F(x) FAR(y)=Prob(sn>y/Ha)=1-Prob(sn y/Ha)=1-G(y) • We need • F(x)=Dist(x) : Genuine (Matching) score DF • G(y)= Dist(y): Imposter (Non-matching) score DF

FAR/FRR • Instead we have • Set of genuine scores X={X1, X2, …., XM} • Set of imposter scores Y={Y1,Y2, …., YN} • We estimate

Problem • What is the accuracy of these error rates? • The number of biometric samples • The quality of the samples • Data collection procedure (e.g. 10 consecutive samples) • Subjects involved, the acquisition device etc.

An Estimation Problem Given x: A random variable (F(x) denotes Dist(x)) X={X1, X2, …., XM}: Sample set Estimate =E(x) Solution Error (Unbiased estimator*)

Biased/Unbiased Estimators • For an unbiased estimator we have • Example: Gaussian Model: Estimate mean 1and variance 2 using maximum likelihood criterion i.e. maximize Prob(X/ ,) (Unbiased estimator) (Biased estimator) (Unbiased estimator)

Confidence Interval • Assume F(x) is given then Dist(r) can be calculated • r is function of , which is a function of x • Calculate (1-) 100% certainty (Next Slide) r[1(,X), 2(,X)] • Which leads to (1-)100% confidence interval for  given by

Confidence Interval • Example • Discard /2 on lower and higher ends • Find the r values corresponding to the interval boundary (called quantile) Dist(r) r Prob(q(/2) r q(1-/2))=1-

Confidence Interval • Interpretation: • Generate sample sets X from F(x) • Calculate confidence intervals for each X • (1-)100% of these intervals contain .

Parametric Method • Xi identically distributed • Assume Xiare independent(not true in general) • Then can be taken to be normal distribution using central limit theorem (large M). • Result: • E.g. For 95% confidence z=1.96 • Smaller interval with increasing M and 

Non-Parametric Method Sample Set X • Assume F(x) is available. f(x) Density of Additional Sample Sets Random Variable

Non-Parametric Method • FACT: For large B we have • Define error to be • Calculate Dist(r) • Solution:

Dist(r) r /2 /2 Non-Parametric Method • Interval calculation: Sorting and counting

Bootstrap Method • F(x) is not available; all we have is X • How do we generate ? • Solution (i.e. Bootstrap method): Sampling with replacement from X. • Put the samples in a bag, draw, record and put it back. • Draw M samples from X B times. Some samples Xi may not be in each set.

/2 /2 Bootstrap Method (Imperfections) • Xi are not independent. • In SR the dependence between samples is not replicated. • Effect of dependence for independent samples • Variance of is smaller • Leads to smaller CIs

Subset Bootstrap • Potential sources of dependency • All samples from the same person (e.g. multiple fingers) • All samples from same biometric (e.g. finger) • Partition X into independent subsets • Apply SR on subsets.

Subset Bootstrap (An example) • Fingerprint database • P persons • c fingers per person  D=cP Fingers • d samples per finger • DB Size= cPd • Matching pairs • d(d-1) per finger • cd(d-1) per person • cPd(d-1)=Dd(d-1) total • Using a symmetric and asymmetric matcher does not make any difference [23].

Subset Bootstrap (An Example) • X1 X2 • X1: P=10 c=2, D=20, d=8  M=1120 • X2: P=50 c=2, D=100, d=8  M=5600 • Finger based partition: Set subsets to be the samples from the same finger (i.e. D subsets of d(d-1) matching scores) • Person based partition: Set subsets to be the samples from the same person (i.e. P subsets of cd(d-1) matching scores)

Subset Bootstrap (An Example) • We expect • CI1 (light gray) to be larger than CI2 (dark gray) • Because X1 has smaller number of samples • CI2 (dark gray) to be contained in CI1 (light gray) • Because X1 X2 • The intervals are larger for person based partitioning • There is dependency between fingers of the same person

CIs for FAR/FRR • Calculate CIs for each threshold T=t0 and given an 

CI for FRR • Given genuine score set X • Generate • Calculate • Sort and count

CI for FAR • Given imposter score set Y • Generate • Calculate • Sort and count

Subset Bootstrap for FAR • Imposter scores Y are not independent • We are using multiple impressions of the same finger. • Let Ixk: kth finger impression from subject x then sim(Ia1,Ib1), sim(Ia1,Ib2), sim(Ia2,Ib3) are not statistically independent • Use a finger only once; for D fingers we have only D/2 such pairs • There is actually dependency between X and Y

Subset Bootstrap for FAR • Fingerprint database • P persons • c fingers per person  D=cP Fingers • d samples per finger • DB Size= cPd • Non-matching pairs • N=d2D(D-1)=P[(dc)2(P-1)+d2c(c-1)] • d2(D-1) per finger • (dc)2(P-1)+d2c(c-1) per person

Subset Bootstrap for FAR …. …. DB Partition IN Ii I1 x Y1=IixI1 YN-1=IixIN Ii • Finger (N=D): Take Ii (d elements), match itagainst Iki(d2 pairs) then we have d2(D-1) pairs. Repeat it with all Ii to construct subsets Yk • Person (N=P): Take Ii (cd elements), match itagainst Iki((dc)2 pairs) then we have (dc)2(P-1) pairs. Inside Ii we have d2c(c-1) pairs. Repeat it with all Ii to construct subsets Yk • Not completely independent: We use Ii many times.

Subset Bootstrap for FRR • Person subset is a better estimate

How good are the CIs? • There exists a true confidence interval (At the beginning we assumed F(x) is known) • The CI we calculate is just one estimate. • How accurate is that estimate?

How good are the CIs? • We estimate E(x) • Ideal Test: Assume F(x) is available • Generate • Calculate • Assume and test if

How good are the CIs? • Practical Test (for comparison) • Randomly split X into two subsets Xa and Xb • Calculate and CIa • Test • Repeat 1-3 many times and count the number of hits i.e. probability of falling into the CIa • Hit rate is not equal to the confidence. Assume have normal distribution. • The higher the hit rate is the better the estimates are.

How good are the CIs? • =0.1 • Person based partitioning provide more accurate confidence intervals • 73.10% is very close to the expected value

Chapter 15 System Errors Revisited

Chapter 15 System Errors Revisited

Presentation Transcript

Chapter 15 Cardiovascular System

Chapter 15 Cardiovascular System

Chapter 15 Digestive System

Chapter 15: Nervous System

Chapter 15 System Implementation

Chapter 15 Respiratory System

Chapter 15: Endocrine System

Chapter 15 System Implementation

Firewall Configuration Errors Revisited

Chapter 15 Urinary System

Firewall Configuration Errors Revisited

Respiratory System Chapter 15

Chapter 15 System Administration

Chapter 15 Cardiovascular System

Metric System - Revisited

SYSTEM ADMINISTRATION Chapter 15

Chapter 15: Endocrine System

Chapter 15 System Errors Revisited

Chapter 15: System Security

Chapter 15 Cardiovascular System

Chapter 4 Revisited

Identification System Errors