1 / 36

Improved hit criteria for DNA local alignment

Improved hit criteria for DNA local alignment. JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France. Plan. Introduction Local alignment Heuristic methods Hit criteria Seed Models and extension proposed Single/Multiple hit strategies and extension proposed

illana-wong
Download Presentation

Improved hit criteria for DNA local alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved hit criteria for DNA local alignment JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France

  2. Plan • Introduction • Local alignment • Heuristic methods • Hit criteria • Seed Models and extension proposed • Single/Multiple hit strategies and extension proposed • Experiments • Conclusion • Extensions

  3. In practice Local alignment methods • Why being interested in local alignment methods • Improvement needed #sequences  , #users  , ( budget  ) • Dynamic programming (Smith-Waterman) • Give an exact solution • Quadratic cost (Best optimization in [Crochemore et al 02]) • Heuristic Algorithms • Fasta, Blast, PatternHunter, Blastz, Yass,…

  4. ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct Detected seeds Dot plot aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Detected alignment Seed filtering • Start with small conserved and easily detected fragments (seeds). • Then extend seeds to build possible alignments

  5. ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct Detected seeds Dot plot → 2. Hit criterion aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Detected alignment Two questions usually asked • seed model: What can serve as a seed? • hit criterion: What is the criterion that witnesses a potential alignment? → 1. Seed model

  6. What can serve as a seed • Exact similarity : Seed Pattern : Contiguous Seed • Example : ATCAGT |||||| ATCAGT ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

  7. Spaced Seed Model [Ma et al. 02: PATTERNHUNTER] • Seed Pattern : ###--#-## ‘#’ : obligatory match position ‘-’ : joker position (“don’t care” position) Weight : 6 [number of #] Span : 9 [number of all symbols] • Example : ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

  8. ||||||||||||||||| ###### ###### ||||||||||||||||| ###### ###### ||||||||||||||||| ###--#-## ###--#-## ||||||||||||||||| ###--#-## ###--#-## Spaced Seeds Some probabilistic observations: • For spaced seeds, hits at subsequent positions are more independent events • For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different

  9. ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## ###--#-## For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different Some probabilistic observations:

  10. Spaced seeds • Spaced seed model is generally more sensitive than the contiguous seed model • Extend spaced seed model by taking into account DNA substitutionsspecificity

  11. A T G C : transversions ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA . transitions Mutational events Biological properties Transitions are usually over-represented. Regularity phenomenon in coding sequences.  Use those properties to extend the spaced seed model ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

  12. BLASTZ model[Schwartz et al. 03] • A spaced seed that allows one possible transition substitution over its ‘#’ positions. • Problem : running time seed of large weight to obtain reasonable speed. ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

  13. YASS model: Transition Constrained Seeds • Seed Pattern: ##@#-#@-### ‘#’ : obligatory match position ‘-’ : joker position (“don’t care” position) ‘@’ : transition constrained position transition constrained position: position that corresponds to either a match or a transition. ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

  14. Transition Constrained Seeds • Seed Pattern: ##@#-#@-### ‘#’ : obligatory match position ‘-’ : joker position (“don’t care” position) ‘@’ : transitions constrained position Weight : 8 [number of # + half number of @] @ carries 1 bit of information whereas # carries 2 bits. @ adapted to GC-rich/poor genomes

  15. Spaced seeds and Transition-Constrained Seeds • Seed pattern ( why##@#-#@-###and not#@-#-#-#@#?) • Not chosen randomly → Need to: • define an alignment model. • search for the best (at least a good) seed pattern according to this model. ( Sensitivity : probability to detect any alignment given by the model ) • Chosen model can drastically change the seed shape… Example Bernoulli model ##@-#@#--#-#-### Markov model ##@##-##@##

  16. 222221221222 X Probabilistic Alignment Models: ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA |||||.||.||||:||||| |||||.||.||||:||||| 2222212212222022222 2222212212222022222 • Bernoulli [Keich et al 02] • Markov [Buhler et al 03] • Automata (M3/M8) and HMMs [Brejova et al 03] • Homogeneous alignments [Kucherov et al 04] P(’2’) = 0.7, P(’1’) = 0.15, P(’0’) = 0.15 Transition has anemission probability for each symbol Ex : P(’2’) = 0.8, P(’1’) = 0.10, P(’0’) = 0.10 “HSP” Alignments found by heuristic algorithms

  17. Seed Design • Alignment Model : Bernoulli • P(match) = 0.7, P(transition)=0.15, P(transversion)=0.15 • alignment length = 64

  18. Seed Design • Alignment Model : Markov • 5th Order, obtained on N.Menengitidis, S.Cerevisiae, Drosophila, and Human sequences.

  19. Experiments • S.Cerevisiae/Neisseiria sequences

  20. To summarize ... • We have presented several seed models (contiguous, “classic” spaced seeds, BLASTZ) • We introduced transition-constrained seeds and showed how they improve the sensitivity • From detected seeds to detected alignments

  21. ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct Detected seeds Dot plot → 2. Hit criterion aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Detected alignment Hit criterion • What is the criterion that witnesses a potential alignment ? • Restriction : only the information about seeds is available

  22. ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct Dot plot Dot plot Dot plot Dot plot aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Several methods have been proposed • FASTA: • Several small seeds on proximal diagonals • BLAST: (single hit) • One “large” seed. • Gapped-BLAST: (double hit) • Two seeds on the same diagonal To define a good criterion we have first to define a class of similarities we want to detect : mutation model

  23. ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Mutation effect on Seeds • Mutation effect • Substitutions : “suppressing seeds” • Indels : “diagonal shifts” • Remaining seeds • Estimation of inter-seeddistances • via a Waiting Time distribution • Estimation of diagonals shifts • via a Random Walk model

  24. |:|||||||:|||:||| ###### ###### |:||||:|||||:|.|. ###--#-## ###--#-## 9 7 YASS hit criterion • According to these parameters, YASS propose: • An intermediate criterion between BLAST single/Gapped Blast double hit criterion. • Overlap controlled multi-hits

  25. Sensitivity ComparisonofBLASTn/Gapped-BLAST/YASShit criteria score 25

  26. Sensitivity (cont) ComparisonofBLASTn/Gapped-BLAST/YASShit criteria score 35

  27. YASS criterion mixed with spaced seeds

  28. Experiments • Local alignment sensitivity • YASS software / BLASTn (2.2.6 package) M.t: M. tuberculosis CDC1551 S.s: Synechocystis sp. PCC 6803 V.p: Vibrio p. RIMD 2210633 I Y.p: Yersinia pestis KIM

  29. Ads

  30. Ads • YASS web page http://www.loria.fr/projects/YASS • YASS can be queried online http://yass.loria.fr • YASS is Open Source

  31. Conclusions • Two improvements: • Transition-constrained spaced seeds • Hit criterion combining statistical models and advantage of single/multi hit strategies. • A tool that implements both of them

  32. Extensions To be done • Multi-seed approach [Li03, Bulher04, Noe04] • Seed design on the fly (non necessary static seeds). • and others …

  33. ? ? ? Questions

  34. |95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC ||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:|||| CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700 |95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400 ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT ||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.| ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770 |95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG ||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|: CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850 |95320 |95310 |95300 |95290 |95280 |95270 |95260 GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG |||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:|||| GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930 |95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG :|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.| AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010 |95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC ||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:|| CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090 |95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA :|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||. GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170 |95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC ||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.||| TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250 |94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG ||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..:: CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT |584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330 |94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA |||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::... TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410 |94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC ..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:|| TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460 *(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r * S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand) * score = 1073 : bitscore = 491.92 * mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335 |96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190 TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG |||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||. TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980 |96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC .:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.||| CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060 |96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG |:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.||||| CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140 |96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA :|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:. CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220 |95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA ||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:| TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300 |95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT |.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:|||| CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380 |95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA ::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||. GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460 |95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG ||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.::::: TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT |583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540 |95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG |....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:|| CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620

  35. |95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC ||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:|||| CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700 |95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400 ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT ||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.| ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770 |95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG ||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|: CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850 |95320 |95310 |95300 |95290 |95280 |95270 |95260 GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG |||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:|||| GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930 |95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG :|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.| AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010 |95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC ||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:|| CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090 |95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA :|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||. GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170 |95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC ||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.||| TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250 |94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG ||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..:: CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT |584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330 |94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA |||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::... TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410 |94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC ..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:|| TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460 *(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r * S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand) * score = 1073 : bitscore = 491.92 * mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335 |96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190 TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG |||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||. TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980 |96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC .:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.||| CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060 |96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG |:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.||||| CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140 |96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA :|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:. CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220 |95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA ||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:| TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300 |95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT |.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:|||| CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380 |95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA ::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||. GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460 |95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG ||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.::::: TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT |583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540 |95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG |....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:|| CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620

More Related