170 likes | 205 Views
Merlin rapid analysis of dense genetic maps using sparse gene flow trees. Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM. Recombinations across successive intervals are independent sequential computation across loci using the forward-backward algorithm is enabled.
E N D
Merlin rapid analysis of dense genetic maps using sparse gene flow trees Tutorial #11 by Anna Tzemach
Background – Lander & Green’s HMM • Recombinations across successive intervals are independent sequential computation across loci using the forward-backward algorithm is enabled. • The algorithm computing the probability of the data given an inheritance vector is linear in the number of founders. • We need to sum over all possible inheritance vectors (exponential in the number of non-founders). • Complexity: • Linear in the number of loci, and number of founders. • Exponential in the number of non-founders.
1 2 11 12 13 14 a/b a/b 21 22 23 24 a/b a/b b/d a/c Reminder – Compute marker data probability given inheritance data • Assume that the descent graph vertices below represent the pedigree on the left • v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 ) • v = (person 12; person 13; person 21; person 22; person 23; person 24)
(a,b) (a,b) (a,c) (b,d) (a,b) Reminder (cont) Descent Graph 3 4 5 6 1 2 7 8 (a,b) (a,b) (a,b) (a,b) (a,c) (b,d) Founder Graph 5 3 6 4 2 1 8 7
Reminder (cont) • Alternatively we can rewrite function as
Simultaneous calculation of all vectors • J =0 • For each meiosis j • Duplicate founder states • V[j] = 0 • Add corresponding edge to the first set of the founder alleles • V[j] = 1 • Add corresponding edge to the second set of the founder alleles • If j < 2 * number of nonfounder goto 2 • Calculate all sets probability
Possible outcome Node with zero likelihood
Cutoff of zero nodes • For each founder allele ai = Ǿ • For each meiosis j (person p = j/2) • kj = alleles of person p; • If ai = Ǿ (allele assignment of corresponding founder) • ai = kj • Else ai = ai ∩kj • If ai = Ǿ : return 0; • Goto meiosis j+1 • if j == 2* number of people compute vector probability
Possible outcome Node with zero likelihood
Cutoff of symmetric nodes • The node is symmetric node if parent pi is known to be homozygous or if offspring oi and all its descendants are not genotyped • If we can not distinguish between v[i] = 0 and v[i] = 1 then P(data | v = ____1_____) = P(data | v = ____0_____) • We do not want to calculate such nodes twice.
Possible outcome Node with zero likelihood Node identical to sibling
Founder Reduction • No way to find ordered genotype for founders • So, if we look on the child of the founder, we cannot distinguish between v[i] = 0 and v[i] =1 • For each founder 1 bit in inheritance vector is constant. • Total time/space save 2f
Recombination speed-up • The transition probability of vector i and vector j For small thetas only small number of recombinations make sense
Example Number of non founders = 3 Number of inheritance vectors = 22*3 = 64 Number of founders = 3 Founder reduction = 23 = 8 Reduced number of vectors = 26-3 = 8
V[1] =0 A1 = {a,b} Example (cont) No need for v[1] = 1 -> founder reduction • V[2] =0 • A3 = {a,b} No need for v[2] = 1 -> founder reduction
V[3] = 0 => 000 a1 = {a,b} ∩ {a,c } = {a} V[3] = 1 => 001 a3 = {a,b} ∩ {a,c } = {a} V[4] = 0 => 0000 a5 = {a,c} V[4] = 0 = > 0010 a5 = {a,c} No need for v[4] = 1 -> founder reduction Example (cont)