1 / 104

Alignments and Phylogenetic tree

Alignments and Phylogenetic tree. Reading: Introduction to Bioinformatics. Arthur M. Lesk . Fourth Edition Chapter 5. Sequence Alignment. Dot Matrix. Sequence A : CTTAACT Sequence B : CGGATCAT. C G G A T C A T. C T T A A C T. Pairwise Alignment.

crodarte
Download Presentation

Alignments and Phylogenetic tree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alignments and Phylogenetic tree Reading: Introduction to Bioinformatics. Arthur M. Lesk. Fourth Edition Chapter 5

  2. Sequence Alignment

  3. Dot Matrix Sequence A:CTTAACT Sequence B:CGGATCAT C G G A T C A T CTTAACT

  4. Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: C---TTAACTCGGATCA--T Sequence A Sequence B

  5. Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT An alignment of A and B: Mismatch Match C---TTAACTCGGATCA--T Deletion gap Insertion gap

  6. Alignment Graph Sequence A: CTTAACT Sequence B: CGGATCAT C G G A T C A T CTTAACT C---TTAACTCGGATCA--T

  7. A simple scoring scheme • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3) C - - - T T A A C TC G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score

  8. An optimal alignment-- the alignment of maximum score • Let A=a1a2…am and B=b1b2…bn . • Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj • With proper initializations, Si,j can be computedas follows.

  9. ComputingSi,j j w(ai,bj) w(ai,-) i w(-,bj) Sm,n

  10. Initializations C G G A T C A T CTTAACT

  11. S3,5 = ? C G G A T C A T CTTAACT

  12. S3,5 = 5 C G G A T C A T CTTAACT optimal score

  13. C T T A A C – TC G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14 C G G A T C A T CTTAACT

  14. Now try this example in class Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal alignment?

  15. Initializations G A A T C T G C CAATTGA

  16. S4,2 = ? G A A T C T G C CAATTGA

  17. S5,5 = ? G A A T C T G C CAATTGA

  18. S5,5 = 14 G A A T C T G C CAATTGA optimal score

  19. C A A T - T G AG A A T C T G C -5 +8 +8 +8 -3 +8 +8 -5 = 27 G A A T C T G C CAATTGA

  20. Global Alignment vs. Local Alignment • global alignment: • local alignment:

  21. An optimal local alignment • Si,j: the score of an optimal local alignment ending at ai and bj • With proper initializations, Si,j can be computedas follows.

  22. Match: 8 Mismatch: -5 Gap symbol: -3 local alignment C G G A T C A T CTTAACT

  23. Match: 8 Mismatch: -5 Gap symbol: -3 local alignment C G G A T C A T CTTAACT The best score

  24. A – C - TA T C A T 8-3+8-3+8 = 18 C G G A T C A T CTTAACT The best score

  25. Now try this example in class Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal local alignment?

  26. Did you get it right? G A A T C T G C CAATTGA

  27. A A T – T GA A T C T G 8+8+8-3+8+8 = 37 G A A T C T G C CAATTGA

  28. Affine gap penalties • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3) • Each gap is charged an extra gap-open penalty: -4. -4 -4 C - - - T T A A C TC G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score: 12 – 4 – 4 = 4

  29. Affine gap panalties • A gap of length k is penalized x + k·y. gap-open penalty • Three cases for alignment endings: • ...x...x • ...x...- • ...-...x gap-symbol penalty an aligned pair a deletion an insertion

  30. Affine gap penalties • Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith a deletion. • Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith an insertion. • Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

  31. Affine gap penalties (A gap of length k is penalized x + k·y.)

  32. D D D I I I S S S Affine gap penalties -y w(ai,bj) -x-y D -x-y I S -y

  33. Constant gap penalties • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: 0 (w(-,x)=w(x,-)=0) • Each gap is charged a constant penalty: -4. -4 -4 C - - - T T A A C TC G G A T C A - - T +8 0 0 0 +8 -5 +8 0 0 +8 = +27 Alignment score: 27 – 4 – 4 = 19

  34. Constant gap penalties • Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith a deletion. • Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith an insertion. • Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

  35. Constant gap penalties

  36. Restricted affine gap panalties • A gap of length k is penalized x + f(k)·y. where f(k) = k for k <= c and f(k) = c for k > c • Five cases for alignment endings: • ...x...x • ...x...- • ...-...x • and 5. for long gaps an aligned pair a deletion an insertion

  37. Restricted affine gap penalties

  38. D(i, j) vs. D’(i, j) • Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j) • Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c D(i, j) <= D’(i, j)

  39. Max{S(i,j)-x-ky, S(i,j)-x-cy} c k

  40. k best local alignments • Smith-Waterman(Smith and Waterman, 1981; Waterman and Eggert, 1987) • FASTA(Wilbur and Lipman, 1983; Lipman and Pearson, 1985) • BLAST(Altschul et al., 1990; Altschul et al., 1997)

  41. FASTA • Find runs of identities, and identify regions with the highest density of identities. • Re-score using PAM matrix, and keep top scoring segments. • Eliminate segments that are unlikely to be part of the alignment. • Optimize the alignment in a band.

  42. FASTA Step 1: Find runes of identities, and identify regions with the highest density of identities. Sequence B Sequence A

  43. FASTA Step 2: Re-score using PAM matrix, andkeep top scoring segments.

  44. FASTA Step 3: Eliminate segments that are unlikely to be part of the alignment.

  45. FASTA Step 4: Optimize the alignment in a band.

  46. BLAST • Basic Local Alignment Search Tool(by Altschul, Gish, Miller, Myers and Lipman) • The central idea of the BLAST algorithm is that a statistically significant alignment is likely to contain a high-scoring pair of aligned words.

  47. The maximal segment pair measure • A maximal segment pair (MSP) is defined to be the highest scoring pair of identical length segments chosen from 2 sequences.(for DNA: Identities: +5; Mismatches: -4) • The MSP score may be computed in time proportional to the product of their lengths. (How?) An exact procedure is too time consuming. • BLAST heuristically attempts to calculate the MSP score. the highest scoring pair

  48. BLAST • Build the hash table for Sequence A. • Scan Sequence B for hits. • Extend hits.

  49. BLAST Step 1: Build the hash table for Sequence A. (3-tuple example) For protein sequences: Seq. A = ELVISAdd xyz to the hash table if Score(xyz, ELV) ≧ T;Add xyz to the hash table if Score(xyz, LVI) ≧ T;Add xyz to the hash table if Score(xyz, VIS) ≧ T; For DNA sequences: Seq. A = AGATCGAT 12345678 AAAAAC..AGA 1..ATC 3..CGA 5..GAT 2 6..TCG 4..TTT

  50. BLAST Step2: Scan sequence B for hits.

More Related