230 likes | 561 Views
Improving Forensic Identification in Bayesian Networks : Accounting for Population Substructure. Amanda B. Hepler. Outline. Population Substructure (PS) Bayesian Networks Introduction Incorporating PS into Paternity Networks Example. What Is Population Substructure?.
E N D
Improving Forensic Identification in Bayesian Networks :Accounting for Population Substructure Amanda B. Hepler
Outline • Population Substructure (PS) • Bayesian Networks • Introduction • Incorporating PS into Paternity Networks • Example
What Is Population Substructure? • Any deviation from random mating • Commonly due to geographical subdivision • Mating pairs often have remote relatives in common • Inbreeding coefficient (): - measures the extent of common ancestry
Why Should We Account for PS? • Ignoring PS “would unfairly overstate the strength of the evidence against the defendant.” (Balding & Nichols, 1995) • “If the allele frequencies for the subgroup are not available…[forensic] calculations should use the population-structure equations.” (1996 NRC Report)
Assumptions • Population allele frequencies are known • Inbreeding coefficient is known • Loci are independent • Within a subpopulation: • Mating is random • Migration and mutation events independent and constant
Graphical Portion Hair Color Eye Color Red 0.5 Hair Color: Red Brown Numerical Portion Brown 0.5 Blue 0.2 0.9 Green 0.8 0.1 What is a Bayesian Network (BN)? • A graphical model that expresses probabilistic relationships among variables or events1 • HUGIN used to create BNs, free version available at http://www.hugin.dk
Why Use Bayesian Networks? BNs provide: • Simple representations of complex problems • Automation of complex algebraic manipulations • Communication aide
Notation for Paternity Case • M = mother, C = child, PF = putative father • Hp: PF is the father of CHd: Some other man is the father of C • Likelihood ratio, or paternity index (PI): • Interpretation: “The evidence is PI times more probable if PF is the father of C than if some other man is the father.”
Genotype Nodes(A1A1, A1A2, A2A2) Mother Putative Father Allele/Gene Nodes(A1, A2) Child Genotype and Allele Nodes • One locus, two alleles: A1and A2 • Observe genotypes of M, C, and PF
PF’s Maternal Gene PF’s Paternal Gene PF’s Genotype Probability Tables for Genotype and Allele Nodes
Hypothesis Node (Yes, No) Original Paternity Network2 • A.P. Dawid, J. Mortera, V.L. Pascali, and D. Van Boxel. Probabilistic expert systems for forensic inference from genetic markers. Scandinavian Journal of Statistics, 29:577-595. 2002.
Accounting for Population Substructure • Probability of allele Aidepends on how many Aialready observed • Modified allele frequencies3: • pi = frequency of the ith allele in the pop’n • ni = number of observed alleles of type Ai • n = total number of alleles observed • D.J. Balding and R.A. Nichols. DNA profile match probability calculation. Forensic Science International, 64(2-3):125-140, 1994.
New Network Nodes • : p1: • Keep track of founder genes: • Counting nodes: is the value of n1 after founder 2 is the value of n1 after founder 3, etc.
New Probability Table now depends on
Paternity Calculations By Hand • θ = 0.03, p1 = 0.10 • M = A1A1, C = A1A1, PF = A1A2 • PIfor this case4: • I.W. Evett and B.S. Weir. Interpreting DNA Evidence. Sinauer, Sunderland,MA., 1998.
Paternity Calculations Using HUGIN This same result can be obtained using HUGIN:
Effect of Introducing θ • Assume no population substructure (θ = 0): • 2.91 more “conservative” than 5.00
Other Examples Considered • Multiple loci case: • Assume loci independent • Multiply PI • Multiple Allele Case: • M and PF have at most four distinct alleles • Missing Father Case: • Brother’s genotype available
Areas for Future Research • Apply same methodology to other BNs: • Mutation • Cross-Transfer Evidence • Mixtures • Remains Identification • Software improvements • Need software for the forensic scientist • Improvements needed for run time