1 / 57

Sequence Alignment

Sequence Alignment. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan E-mail: kmchao@csie.ntu.edu.tw WWW: http://www.csie.ntu.edu.tw/~kmchao. Bioinformatics. Bioinformatics and Computational Biology-Related Journals:.

ryanwilson
Download Presentation

Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Alignment Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan E-mail: kmchao@csie.ntu.edu.tw WWW: http://www.csie.ntu.edu.tw/~kmchao

  2. Bioinformatics

  3. Bioinformatics and Computational Biology-Related Journals: • Bioinformatics (previously called CABIOS) • Bulletin of Mathematical Biology • Genome Research • Genomics • IEEE/ACM Transactions on Computational Biology and Bioinformatics • Journal of Bioinformatics and Computational Biology • Journal of Computational Biology • Journal of Molecular Biology • Nature • Nucleic Acid Research • Science

  4. Bioinformatics and Computational Biology-Related Conferences: • Intelligent Systems for Molecular Biology (ISMB) • Pacific Symposium on Biocomputing (PSB) • The Annual International Conference on Research in Computational Molecular Biology (RECOMB) • Workshop on Algorithms in Bioinformatics (WABI) • The IEEE Computer Society Bioinformatics Conference (CSB)

  5. Bioinformatics and Computational Biology-Related Books: • Calculating the Secrets of Life: Applications of the Mathematical Sciences in Molecular Biology, by Eric S. Lander and Michael S. Waterman (1995) • Introduction to Computational Biology: Maps, Sequences, and Genomes, by Michael S. Waterman (1995) • Introduction to Computational Molecular Biology, by Joao Carlos Setubal and Joao Meidanis (1996) • Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, by Dan Gusfield (1997) • Computational Molecular Biology: An Algorithmic Approach, by Pavel Pevzner (2000) • Introduction to Bioinformatics, by Arthur M. Lesk (2002)

  6. Useful Websites • MIT Biology Hypertextbook • http://www.mit.edu:8001/afs/athena/course/other/esgbio/www/7001main.html • The International Society for Computational Biology: • http://www.iscb.org/ • National Center for Biotechnology Information (NCBI, NIH): • http://www.ncbi.nlm.nih.gov/ • European Bioinformatics Institute (EBI): • http://www.ebi.ac.uk/ • DNA Data Bank of Japan (DDBJ): • http://www.ddbj.nig.ac.jp/

  7. Sequence Alignment

  8. Dot Matrix C G G A T C A T Sequence A:CTTAACT Sequence B:CGGATCAT CTTAACT

  9. Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: C---TTAACTCGGATCA--T Sequence A Sequence B

  10. Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: Mismatch Match C---TTAACTCGGATCA--T Deletion gap Insertion gap

  11. Alignment Graph C G G A T C A T Sequence A: CTTAACT Sequence B: CGGATCAT CTTAACT C---TTAACTCGGATCA--T

  12. A simple scoring scheme • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3) C - - - T T A A C TC G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score

  13. An optimal alignment-- the alignment of maximum score • Let A=a1a2…am and B=b1b2…bn . • Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj • With proper initializations, Si,j can be computedas follows.

  14. ComputingSi,j j w(ai,bj) w(ai,-) i w(-,bj) Sm,n

  15. Initializations C G G A T C A T CTTAACT

  16. S3,5 = ? C G G A T C A T CTTAACT

  17. S3,5 = 5 C G G A T C A T CTTAACT optimal score

  18. C T T A A C – TC G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14 C G G A T C A T CTTAACT

  19. Now try this example in class Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal alignment?

  20. Initializations G A A T C T G C CAATTGA

  21. S4,2 = ? G A A T C T G C CAATTGA

  22. S5,5 = ? G A A T C T G C CAATTGA

  23. S5,5 = 14 G A A T C T G C CAATTGA optimal score

  24. C A A T - T G AG A A T C T G C -5 +8 +8 +8 -3 +8 +8 -5 = 27 G A A T C T G C CAATTGA

  25. Global Alignment vs. Local Alignment • global alignment: • local alignment:

  26. An optimal local alignment • Si,j: the score of an optimal local alignment ending at ai and bj • With proper initializations, Si,j can be computedas follows.

  27. Match: 8 Mismatch: -5 Gap symbol: -3 local alignment C G G A T C A T CTTAACT

  28. Match: 8 Mismatch: -5 Gap symbol: -3 local alignment C G G A T C A T CTTAACT The best score

  29. A – C - TA T C A T 8-3+8-3+8 = 18 C G G A T C A T CTTAACT The best score

  30. Now try this example in class Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal local alignment?

  31. Did you get it right? G A A T C T G C CAATTGA

  32. A A T – T GA A T C T G 8+8+8-3+8+8 = 37 G A A T C T G C CAATTGA

  33. Affine gap penalties • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3) • Each gap is charged an extra gap-open penalty: -4. -4 -4 C - - - T T A A C TC G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score: 12 – 4 – 4 = 4

  34. Affine gap panalties • A gap of length k is penalized x + k·y. gap-open penalty • Three cases for alignment endings: • ...x...x • ...x...- • ...-...x gap-symbol penalty an aligned pair a deletion an insertion

  35. Affine gap penalties • Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith a deletion. • Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith an insertion. • Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

  36. Affine gap penalties (A gap of length k is penalized x + k·y.)

  37. D D D I I I S S S Affine gap penalties -y w(ai,bj) -x-y D -x-y I S -y

  38. Constant gap penalties • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: 0 (w(-,x)=w(x,-)=0) • Each gap is charged a constant penalty: -4. -4 -4 C - - - T T A A C TC G G A T C A - - T +8 0 0 0 +8 -5 +8 0 0 +8 = +27 Alignment score: 27 – 4 – 4 = 19

  39. Constant gap penalties • Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith a deletion. • Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith an insertion. • Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

  40. Constant gap penalties

  41. Restricted affine gap panalties • A gap of length k is penalized x + f(k)·y. where f(k) = k for k <= c and f(k) = c for k > c • Five cases for alignment endings: • ...x...x • ...x...- • ...-...x • and 5. for long gaps an aligned pair a deletion an insertion

  42. Restricted affine gap penalties

  43. D(i, j) vs. D’(i, j) • Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j) • Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c D(i, j) <= D’(i, j)

  44. Max{S(i,j)-x-ky, S(i,j)-x-cy} c k

  45. k best local alignments • Smith-Waterman(Smith and Waterman, 1981; Waterman and Eggert, 1987) • FASTA(Wilbur and Lipman, 1983; Lipman and Pearson, 1985) • BLAST(Altschul et al., 1990; Altschul et al., 1997)

  46. FASTA • Find runs of identities, and identify regions with the highest density of identities. • Re-score using PAM matrix, and keep top scoring segments. • Eliminate segments that are unlikely to be part of the alignment. • Optimize the alignment in a band.

  47. FASTA Step 1: Find runes of identities, and identify regions with the highest density of identities. Sequence B Sequence A

  48. FASTA Step 2: Re-score using PAM matrix, andkeep top scoring segments.

  49. FASTA Step 3: Eliminate segments that are unlikely to be part of the alignment.

  50. FASTA Step 4: Optimize the alignment in a band.

More Related