260 likes | 386 Views
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree. Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ. , China. Outline.
E N D
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang, Univ. California, Riverside Jing Xiao, Lirong Xia, Tsinghua Univ. , China
Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion
Pedigree • An example: British Royal Family
Example: Mendelian experiment Biological Background • Mendelian Law: one haplotype comes from the father and the other comes from the mother. • Basic concepts maternal paternal 11 22:homozygous 12:heterozgyous 1|2 2|1
1111 2222 2222 2222 1111 2222 2222 2222 Father Father Mother Mother 2222 1111 1122 2222 1222 1122 2122 2222 : recombinant Child Child 1 recombinant 0 recombinant Haplotype Configuration Genotype Notations and Recombinant
1 2 1 2 2 1 1 2 1 2 1 2 (b) Haplotype Configuration Reconstruction • Haplotypes: useful, but expensive to obtain • Genotypes: not so informative, but cheaper to obtain • In biological application, genotypes instead of haplotypes are collected. • How to reconstruct haplotype from genotype? • recombination-free assumption
The ZRHC problem • Problem definition • Given a pedigree and the genotype information for each member, find a recombination-freehaplotype configuration for each member that obeys the Mendelian law of inheritance.
Previous Work • Li and Jiang introduced a system of linear equations over F[2] and presented an time algorithm for ZRHC [LJ03] , where m is #lociand n is #members in pedigree. • Several attempts have been made recently, but the authors failed to prove the correctness of their algorithms in all cases, especially when the input pedigree has mating loops [CZ04] [LCL06]. • Recently, Chan et al. proposed a linear-time algorithm in [CCC+06], which only works for pedigree without mating loops.
Related work • Methods based on fast matrix multiplication algorithms could achieve an asymptotic speed of O(k2.376) on k equations with k unknowns • The Lanczos and conjugate gradient algorithms are only heuristics [GV96]. • The Wiedeman algorithm has expected quadratic running time [W86]
Our Result • We present a much faster algorithm for ZRHC with running time . O(n log2n log log n) O(n) redundancy elimination O(n) transformation Ax=b Ax=b Ax=b
Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion Ax=b
Unknowns • : thepaternal haplotype vector of a member j. • : the scalar demonstrating inheritance info between a parent j1and a child j. The New Linear System • n, m • m : #loci n: #members in pedigree
j2 j1 j2 j1 Pj1,1 pj1,2 pj1,3pj1,4 Pj1,1+1pj1,2+0pj1,3+0pj1,4 +1 Pj2,1 pj2,2pj2,3pj2,4 Pj2,1+0pj2,2+1pj2,3+1pj2,4+1 0100 1101 0111 0000 Pj2 Pj2 +wj2 Pj1+wj1 Pj1 hj1,j hj2,j j j Pj,1 pj,2 pj,3 pj,4 Pj,1 +1pj,2 +1pj,3 +0pj,4 +0 1101 0 0 0 1 Pj+wj Pj The New Linear System pj1,2=1 pj1,3=0
The Linear System • O(mn) equations on O(mn) unknowns. • Given a homozygous locus i on a member j (with a child j1), pj[i] and pj1[i] arepre-determined.
Pedigree graph G 12 11 12 22 12 12 11 12 12 1 2 3 1 2 3 4 5 6 7 12 11 12 12 12 12 12 12 12 11 22 12 4 5 6 7 8 12 22 22 8 9 22 12 12 9 #edges · 2n Pedigree Graph • A pedigree with genotype
1 0 ? 1 2 3 h1,4 1 1 0 4 5 6 7 1 h6,8 8 0 h4,9 h8,9 1 9 (b) Locus graph Locus Graph • Locus graphGi Gi = (V, Ei), where Ei= {(k,j)| k is a parent of j, wk[i]=1} 12 22 11 1 2 3 4 5 6 7 12 12 11 12 12 8 Zero-weight 9 : 22 (a) Genotype info Example: Locus graph for the 3rd locus
Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion O(n) transformation Ax=b Ax=b O(mn)
(proof sketch) Assume the path in locus graph Gi connecting two pre-determinedvertices j0and jk . … dj1, j2 djk-1, jk dj0, j1 hjk-1, jk hj1, j2 hj0, j1 Pj1[i] Pj2[i] Pjk-1[i] Pjk[i] Pj0[i] Pj0[i] = Pj1[i] + dj0, j1 + hj0, j1 Pj1[i] = Pj2[i] + dj1, j2 + hj1, j2 Pj2[i] = Pj3[i] + dj2, j3 + hj2, j2 … Pjk-1[i] = Pjk[i] + djk-1, jk + hjk-1, jk a constant An Observation • For any cycle or any path in a locus graph connecting two pre-determined vertices, the summation of h-variables along the path is a constant. We can use paths to denote constraints!
0 ? ? ? ? ? 1 2 3 1 2 3 h3,5 h3,5 h3,6 h2,4 h3,6 h2,5 h2,5 h2,6 1 ? 1 ? ? ? ? ? 4 5 6 7 4 5 6 7 h6,8 h4,9 : 8 8 1 1 0 0 9 9 (b) 2nd locus graph h3,5 + h3,6 + h2,5 + h2,6 = 0 (c) 3rd locus graph h4,9 + h2,4 + h2,5 + h3,5 + h3,6 + h6,8 = 0 Examples of Linear Constraints ? 1 0 1 2 3 1 1 0 1 4 5 6 7 h6,8 8 0 h8,9 1 9 (a) 1st locus graph h6,8 + h8,9= 1
Linear Constraints • Obviously, the linear constraints are necessary. We can also show that these constraints are sufficient. • Moreover, we can upper bound #constraints in each locus graph as O(n), while the trivial analysis gives an upper bound O(n2). • Total #constraints = O(mn).
Traditional method • Solve h-variables and p-variables together • O(mn)equations onO(mn)unknowns: O(mn)p-variablesandO(n)h-variables. • Our method • Solve h-variables and p-variables separately • O(mn) linear equations on O(n)h-variables. The ZRHC-PHASE algorithm Algorithm ZRHC_PHASE input: a pedigree G=(V,E) and genotype{gj} output: a general solution of {pj} begin Step 1. Preprocessing Step 2. Linear constraint generation on h-variables Step 3. Solve h-variables by Gaussian Elimination Step 4. Solve the p-variables by propagation from pre-determined p-variables to others. end
Outline • Introduction and problem definition • A new system of linear equations for ZRHC • An O(mn3) time algorithm for ZRHC • An improved algorithm for ZRHC • Conclusion O(n) redundancy elimination O(n) transformation Ax=b Ax=b Ax=b O(mn) O(n log2n log log n)
Key lemma Redundant Equation Elimination • An observation j0 j1 • Given a cycle , assume that there are constraints among each pair of vertices. • Originally, there are O(k2) constraints. Notice that they are not independent. • However, we can replace the original constraints by an equivalent set of constraints with size O(k). j2 jk … jk-2 jk-1 j0~j2 j2~jk-1 j0~jk-1 Remove the redundant equations without solving them!
Redundant Equation Elimination • Given a spanning tree, the stretch of an edge (k, j) is defined as the length of the unique path between k and j on the tree. • Elkin, Emeky, Spielman and Teng shows that we can embed any graph in a low-stretch spanning tree with averagestretch O(log2n log log n). • The number of irredundant constraints can be bounded by the sum of cycle lengths, which is further bounded by the sumof stretches O(nlog2n log log n).
Conclusion • We present an efficient algorithm for ZRHC with running time O(mn2+n3log2n log log n). • It remains interesting if the time complexity for ZRHC on general pedigrees can be improved to O(mn2+n3) or lower. • Another open question is how to use the algorithm to get haplotype configurations on pedigrees that require only a small (constant) number of recombinants
Thanks for your time and attention!