270 likes | 420 Views
Linear Sequence Alignment. Travis Hillenbrand. Dot Matrix Dynamic Programming Algorithm Greedy X-drop Approach Linear Alignment. Methods of Comparison. Dot Matrix Method. http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html. Match …C… …C…. Mismatch …C… …G…. Indel …C… …-…. |.
E N D
Linear Sequence Alignment Travis Hillenbrand
Dot Matrix Dynamic Programming Algorithm Greedy X-drop Approach Linear Alignment Methods of Comparison
Dot Matrix Method http://arbl.cvmbs.colostate.edu/molkit/dnadot/index.html
Match …C… …C… Mismatch …C… …G… Indel …C… …-… | Sequence Alignment ATCGATACG, ATGGATTACG 3 possibilities
|| ||| ||| Matches: +1 +1 +1 +1 +1 +1 +1 +1 = +8 Mismatches: -1 = -1 Gaps: -2 = -2 Total score = +5 Global Pairwise Alignment ATCGAT-ACG ATGGATTACG ATCGATACG, ATGGATTACG
Dynamic Programming Global alignment (Needleman-Wunsch) algorithm
Dynamic Programming Global alignment (Needleman-Wunsch) algorithm
Dynamic Programming Global alignment (Needleman-Wunsch) algorithm
+1 Max= 1 Dynamic Programming Global alignment (Needleman-Wunsch) algorithm
Dynamic Programming Global alignment (Needleman-Wunsch) algorithm
Dynamic Programming Global alignment (Needleman-Wunsch algorithm) GATC GA-C || |
Greedy X-drop Alignment • Aligns sequences that differ by sequencing errors • Works with measure of difference • Restricts indel penalty Zhang et al. 2000
Greedy X-drop Alignment Zhang et al. 2000
Greedy X-drop Alignment • X-drop condition saves computation
ATCGATACG ATGGATTACG ATCGATACG ATGGATTACG ATCGATACG ATGGATTACG ATCGATACG ATGGATTACG ATCGATACG ATGGATTACG | || ||| Linear Alignment • Index of coincidence • Maximum number of matches between two sequences • Ungapped alignment …
ATCGATACG ATGGATTACG -ATCGATACG ATGGATTACG ATCGATACG -ATGGATTACG ATCGATACG ATGGATTACG || ||| |||| | || ||| Linear Alignment • Attempt to increase similarity Window score: 2 -3 -3
Comparison of alignments • 9 human/mouse homologous gene cds pairs retrieved (Jareborg et al. 1999) • Greedy alignment run first mat=10, mis=-6, X=2200 (indel=-11) • Dynamic Programming and Linear alignment using truncated seqs
Comparison of alignments • Similarity scores
Comparison of alignments • Similarity percentage
Comparison of alignments Maximum coincidence alignment: Offset -72 yielded 1642 matches of 2175 possible (75.4943% similarity), score 6611 ACAGTACTGCTACTTCTCGCCGACTGGGTGCTGCTCCGGACCGCGCTGCCCCGCATATTCTCCCTGCTGGTGCCCACCGCGCTGCCACTGCTCCGGGT | | || | | | ||||||| | | | | | | || | || | | ||| | ATGGCTGCGCACGTCTGGCTGGCGGCCGCCCTGCTCCTTCTGGTGGACTGGCTGCTGCTGCGGCCCATGCTCCCGGGAATCTTCTCCCTGTTGGTTCC ACGGGCCGCCTCACTGACTGGATTCTACAAGATGGCTCAGCCGATACCTTCACTCGAAACTTAACTCTCATGTCCATTCTCACCATAGCCAGTGCAGT ||||||||| |||||||||||||||| || ||| ||| || |||||| || ||| || |||||||||||||||||||||||||| ||| ACGGGCCGCATCACTGACTGGATTCTTCAGGATAAGACAGTTCCTAGCTTCACCCGCAACATATGGCTCATGTCCATTCTCACCATAGCCAGCACAGC Decreasing the gap penalty allows similar regions to be aligned without using IOC
Comparison of alignments References Needleman, S. B. & Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 48: 443-453. Setubal, J. and Meidanis, J. 1997. Introduction to Computational Molecular Biology. Pacific Grove, California: Brooks/Cole. Zhang, Z.; Schwartz, S.; Wagner, L.; and Miller, W. 2000. A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7:203-214.
Linear Sequence Alignment Travis Hillenbrand