140 likes | 366 Views
Pair HMM and the Stepping Stone algorithm. Mani Right Now. Our Pair HMM state diagram. Viterbi algorithm. Time complexity O( mns 2 ), Space complexity O( mns ) m – length of genomic sequence n – length of cDNA sequence s – number of states (~13 now)
E N D
Pair HMM and the Stepping Stone algorithm Mani Right Now
Viterbi algorithm • Time complexity O(mns2), Space complexity O(mns) • m – length of genomic sequence • n – length of cDNA sequence • s – number of states (~13 now) • Suffice to say that it could be a HUGE number!
Viterbi Matrix Genomic sequence cDNA sequence
Stepping stone – Seed Alignments Genomic sequence cDNA sequence
Stepping Stone - Alignment Pins Genomic sequence cDNA sequence
Viterbi Submatrices Savings: approx. 50% Genomic sequence cDNA sequence
MGC test set • 10634 optimal alignments • 18000+ stepping stone alignments* • Compared the 10634 • Only 15 were different (0.14%!) * Still running
Diff1 BC043644.12665. 474 TAGTAGAGGCGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCGAAATCCCG 523 |||||||| | ||||||| ||||||||| ||||||||||| ||| ||| | BC043644 1556 TAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCTTGAACTCCTG 1605 BC043644.12665. 524 ACCTCAGGTGATCTGCCCACCTCAGCCTCCCAAAGTGCTGGGATT 568 ||||||||||||| ||||||| || ||||| |||||||||||| BC043644 1606 ACCTCAGGTGATCCACCCACCTGGGCTTCCCATAGTGCTGGGATT 1650 BC043644.12665. 474 TAGTAGAGGCGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCGAAATCCCG 523 |||||||| | ||||||| ||||||||| ||||||||||| ||| ||| | BC043644 1556 TAGTAGAGACAGGGTTTCACCATGTTGGCCAGGCTGGTCTTGAACTCCTG 1605 BC043644.12665. 524 ACCTCAGGTGATCTGCCCA------------------------------- 542 ||||||||||||| |||| BC043644 1606 ACCTCAGGTGATCCACCCACCTGGGCTTCCCATAGTGCTGGGATTCAATT 1655 --//-- BC043644.12665. 543 ----------------------------------------------CCTC 546 |||| BC043644 3506 GTTGGCCAGGCTGGTCTCGAACTCCTGACATCAGGTGATCCACCTGCCTC 3555 BC043644.12665. 547 AGCCTCCCAAAGTGCTGGGATTAGAGGCGTGAACCAC 583 |||||||||||||||||||||| |||||||| |||| BC043644 3556 GGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCAC 3592
Diff2 BC011727.13323. 18583 GGACTGATGAGGTCTTAACAAAAACCAGTGTGGCAAAAAAAAAAAAAAAA 18632 |||||||||||||||||||||||||||||||||||||||||||||||||| BC011727 1857 GGACTGATGAGGTCTTAACAAAAACCAGTGTGGCAAAAAAAAAAAAAAAA 1906 BC011727.13323. 18633 AAAAAAAAAAAAA 18645 ||||||||||||| BC011727 1907 AAAAAAAAAAAAA 1919 Score = 39 (77.8 bits), Expect = 0., Sum P(8) = 0., Group = 1 Identities = 39/39 (100%), Positives = 39/39 (100%), Strand = Plus / Plus Query: 1917 AAAAAAAAAAAAAAAAAAAAAAAAAAAAATCCTAAAAAC 1955 ||||||||||||||||||||||||||||||||||||||| Sbjct: 18617 AAAAAAAAAAAAAAAAAAAAAAAAAAAAATCCTAAAAAC 18655 BC011727.13323. 18583 GGACTGATGAGGTCTTAACAAAAACCAGTGTGGCAAAAAAAAAAAAAAAA 18632 |||||||||||||||||||||||||||||||||||||||||||||||||| BC011727 1857 GGACTGATGAGGTCTTAACAAAAACCAGTGTGGCAAAAAAAAAAAAAAAA 1906 BC011727.13323. 18633 AAAAAAAAAAAAATCCTAAAAACAAACAAACAAAAAAAA 18671 ||||||||||||| ||||| ||| ||| |||||||| BC011727 1907 AAAAAAAAAAAAA----AAAAAAAAAAAAAAAAAAAAAA 1941
Placeholder Genomic sequence cDNA sequence
Now what? • Compare with EST_GENOME • Null model • Use pairHMM for the ENCODE gene prediction workshop • Double pins