Thinking About DNA Database Searches

Thinking About DNA Database Searches William C. Thompson Dept. of Criminology, Law & Society University of California, Irvine

Value of DNA Match for Proving Identity Prior Odds x Likelihood Ratio = Posterior Odds May be very low 1 x ------------------ RMP + FPP* 1:1 million x 1 billion:1 = 1000:1 1:1 million x 1 million:1 = 1:1 1:1000 x 10,000:1 = 10:1 *Actually RMP + [FPP x (1-RMP)], see Thompson, Taroni & Aitken, 2003

Mysterious Clusters and the Law of Truly Large Numbers • In a truly large sample space, seemingly unusual events are bound to occur • E.g., double lottery winners; cancer clusters • See, Diaconis & Mosteller (1989). Methods for studying coincidences, JASA, 84 853-861.

Taking Account of Coincidence When Searching TrulyLarge DNA Databases Should the frequency of the matching profile be presented to the jury? Standard answers: • No • NRC I – test additional loci; report only freq. of those • NRC II—multiply freq. by N (for database) • Yes • Friedman/Donnelley—present LR but keep in mind prior odds may be very low • Prosecutors Everywhere—jury should hear most impressive number possible “because it’s relevant”

My Solution: Present Profile Frequency Only When It Equals the RMP* • Multiple Tests of Different Hypotheses • Search unsolved crime evidence against offender database • For each offender, p(match|not source) = frequency • Multiple Tests of Same Hypothesis • Search suspect against unsolved crime database to see if he matches any unsolved crime • For this suspect, p(match|not source) = Freq. x N *RMP = p(match|suspect not the source)

My Solution: Present Profile Frequency Only When It Equals the RMP* • Testing relatives of people who almost match • For most suspects, p(match|not source) = frequency of matching profile • For relatives of people who almost match, p(match|not source) >>>> frequency • Therefore it is misleading to present the frequency of the matching profile in cases where the suspect is selected because a relative almost matches

Database Searches and the Birthday Problem • The probability that a randomly chosen person will have my birthday is 1 in 365 • The probability that any two people in a room share a birthday can be far higher • With 23 people in a room, the likelihood that two will share a birthday exceeds 1 in 2 • With 60 people in the room, the probability is nearly 1 in 1

Database Searches and the Birthday Problem • Suppose the probability of a random match between any two DNA profiles is between 1 in 10 billion and 1 in 1 trillion • What is the probability of finding a match between two such profiles in a database of: • 1,000 • 100,000 • 1,000,000

Approximate likelihood that two profiles in a DNA database will match Profile Frequency

Why present a birthday statistic in database cases? • Because it is relevant…

Thinking About DNA Database Searches

Thinking About DNA Database Searches

Presentation Transcript

Building database searches

Thinking About

CODIS and Database Searches

Thinking About Thinking

Thinking about Learning, or Learning about Thinking

Geog111, Thinking about thinking

Thinking about

Database Searches

Database Searches

Thinking Critically About Critical Thinking

Thinking about Thinking…

FASTA Database Sequence Searches

Text Based Database Searches

Pairwise Alignments and Database Searches: Algorithms

Geog111, Thinking about thinking

Implications of database searches for DNA profiling statistics

DNA DATABASE EXPANSION 2001

Database Searches

Thinking About Thinking About Nature

Thinking about ...

Sequence Alignments and Database Searches

Database Searches