10 likes | 85 Views
0 2 1 2 00 2 10 0 1 1 0 00 1 10 0 0 1 1 00 0 10. genotype. two haplotypes per individual. … ataggtcc C tatttcgcgc C gtatacacggg A ctata … … ataggtcc G tatttcgcgc C gtatacacggg T ctata … … ataggtcc C tatttcgcgc C gtatacacggg T ctata …. HMM structure:
E N D
021200210 011000110 001100010 genotype two haplotypes per individual … ataggtccCtatttcgcgcCgtatacacgggActata … … ataggtccGtatttcgcgcCgtatacacgggTctata … … ataggtccCtatttcgcgcCgtatacacgggTctata … • HMM structure: • Left-to-right HMM similar to models proposed by [Schwartz 04, Rastas et al. 05, Kimmel&Shamir 05] • Determined by number n of SNP loci and user specified number K of “founder” states at each SNP (set to 7 in our experiments) • Each state allowed to emit both alleles but training usually introduces strong bias towards one of them • Paths with high transition probability correspond to “founder” haplotypes; transition probabilities capture observed (founder-specific) recombination rates • Efficient Likelihood Computations: • A trained HMM M emits haplotypes along left-to-right paths • P(H|M) = sum over all possible HMM paths of joint probability that M follows and emits H; efficiently computed in O(nK) time using forward algorithm • P(G|M) = probability with which M emits any two haplotypes that explain G along any pair of paths;efficiently computed in O(nK3) time by a 2-path extension of the forward algorithm combined with speed-up idea of [Rastas07] • Similar speed-up can be used for computing in O(nK5) the likelihood of genotype trios