200 likes | 464 Views
The birthday problem and database searches: The implications of relatives in offender databases. Jason R. Gilder 8/12/2006. The birthday problem. What is the probability of someone randomly picked having your birthday?
E N D
The birthday problem and database searches: The implications of relatives in offender databases Jason R. Gilder 8/12/2006
The birthday problem • What is the probability of someone randomly picked having your birthday? • If births are evenly distributed probability of picking someone with your birthday is: • 1 in 365 • Similar to the random match probability
The birthday paradox • What if we change the question to How many people do we need to search before we find two people with the same birthday? • Many more comparisons • Will find a match much more quickly • Greater than a 50% chance of finding a match with 23 individuals
Combinatorial calculations • How many pairs are present with n individuals? • 23 individuals => 253 pairs
Database searches • Birthday paradox is similar to a database search • Arizona performed a complete pairwise search of their DNA database containing 65,493 profiles • 144 pairs of individuals matching at 9+ loci • Related individuals likely exist in DNA databases
Simulation studies • Use FBI published Caucasian genotypes • Generate database of randomized (unrelated) individuals • Create and add pairs of related individuals • Siblings • Parent-child • Half-siblings • Cousins
Unrelated individuals Here, the database of 65k has 109 pairs of individuals matching at 9+ loci
The size of the database vs. the number of matching profiles by alleles • Here, the database of 65k has 139 pairs of individuals matching at 21+ alleles • Florida’s threshold for a familial search
Effect of adding siblings to a database of 10,000 individuals Additional 35 matches at 9+ loci with ~1000 sibling pairs
The number of pairs of siblings vs. the number of matching profiles by alleles Database of 10,000 individuals
The number of 9+ locus matching profiles within databases containing different sibling ratios
The number of 21+ allele matching profiles within databases containing different sibling ratios
Effect of different degrees of related individuals (9+ locus matches) Databases contain 10% related individuals Results are averaged over 5 replicates of the database (average, standard deviation)
Effect of different degrees of related individuals (21+ allele matches) Databases contain 10% related individuals Results are averaged over 5 replicates of the database (average, standard deviation)
Theoretical model • Current work is to develop mathematical model to estimate number of sibling pairs found in a database • Use repeat rate to determine probability of siblings sharing same genotype • Identity by descent and identity by state
Estimating number of sib pairs • Number of sibling pairs needed to find an 11+ locus match • At least 5% chance: 460 pairs • Experimentally, we found an 11+ locus match with a database of 10,000 individuals and 500 pairs of siblings
Estimating number of sib pairs • Number of sibling pairs needed to find an 12+ locus match • At least 5% chance: 696 pairs • Experimentally, we found one 12+ locus match with a database of 10,000 individuals and 1,000 pairs of siblings
Questions? Jason Gilder Forensic Bioinformatics www.bioforensics.com