100 likes | 195 Views
Thinking About DNA Database Searches. William C. Thompson Dept. of Criminology, Law & Society University of California, Irvine. Value of DNA Match for Proving Identity. Prior Odds x Likelihood Ratio = Posterior Odds. May be very low. 1
E N D
Thinking About DNA Database Searches William C. Thompson Dept. of Criminology, Law & Society University of California, Irvine
Value of DNA Match for Proving Identity Prior Odds x Likelihood Ratio = Posterior Odds May be very low 1 x ------------------ RMP + FPP* 1:1 million x 1 billion:1 = 1000:1 1:1 million x 1 million:1 = 1:1 1:1000 x 10,000:1 = 10:1 *Actually RMP + [FPP x (1-RMP)], see Thompson, Taroni & Aitken, 2003
Mysterious Clusters and the Law of Truly Large Numbers • In a truly large sample space, seemingly unusual events are bound to occur • E.g., double lottery winners; cancer clusters • See, Diaconis & Mosteller (1989). Methods for studying coincidences, JASA, 84 853-861.
Taking Account of Coincidence When Searching TrulyLarge DNA Databases Should the frequency of the matching profile be presented to the jury? Standard answers: • No • NRC I – test additional loci; report only freq. of those • NRC II—multiply freq. by N (for database) • Yes • Friedman/Donnelley—present LR but keep in mind prior odds may be very low • Prosecutors Everywhere—jury should hear most impressive number possible “because it’s relevant”
My Solution: Present Profile Frequency Only When It Equals the RMP* • Multiple Tests of Different Hypotheses • Search unsolved crime evidence against offender database • For each offender, p(match|not source) = frequency • Multiple Tests of Same Hypothesis • Search suspect against unsolved crime database to see if he matches any unsolved crime • For this suspect, p(match|not source) = Freq. x N *RMP = p(match|suspect not the source)
My Solution: Present Profile Frequency Only When It Equals the RMP* • Testing relatives of people who almost match • For most suspects, p(match|not source) = frequency of matching profile • For relatives of people who almost match, p(match|not source) >>>> frequency • Therefore it is misleading to present the frequency of the matching profile in cases where the suspect is selected because a relative almost matches
Database Searches and the Birthday Problem • The probability that a randomly chosen person will have my birthday is 1 in 365 • The probability that any two people in a room share a birthday can be far higher • With 23 people in a room, the likelihood that two will share a birthday exceeds 1 in 2 • With 60 people in the room, the probability is nearly 1 in 1
Database Searches and the Birthday Problem • Suppose the probability of a random match between any two DNA profiles is between 1 in 10 billion and 1 in 1 trillion • What is the probability of finding a match between two such profiles in a database of: • 1,000 • 100,000 • 1,000,000
Approximate likelihood that two profiles in a DNA database will match Profile Frequency
Why present a birthday statistic in database cases? • Because it is relevant…