1 / 17

Sequence Alignment Tutorial #2

Sequence Alignment Tutorial #2. © Ydo Wexler & Dan Geiger. Sequence Comparison. Much of bioinformatics involves sequences DNA sequences RNA sequences Protein sequences We can think of these sequences as strings of letters DNA & RNA: |alphabet|=4 Protein: |alphabet|=20. Global Alignment.

farren
Download Presentation

Sequence Alignment Tutorial #2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence AlignmentTutorial #2 © Ydo Wexler & Dan Geiger .

  2. Sequence Comparison Much of bioinformatics involves sequences • DNA sequences • RNA sequences • Protein sequences We can think of these sequences as strings of letters • DNA & RNA: |alphabet|=4 • Protein: |alphabet|=20

  3. Global Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: • GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA • A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A

  4. Hypotheses space Best biological explanaiton Biological data Global Alignment -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: • Perfect matches • Mismatches • Insertions & deletions (indel) Example (cont): Symmetric view of evolution

  5. Global Alignmentscoring scheme Score each position independently: • Match: +1 • Mismatch: -1 • Indel: -2 Score of an alignment is sum of position scores Example:-GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Score: (+1x13) + (-1x2) + (-2x4) = 3 ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Score:(+1x5) + (-1x6) + (-2x11) = -23

  6. Sequence Alignment Variants Two basic variants of sequence alignment: • Global alignment (Needelman-Wunsch) • Local alignment (Smith-Waterman) Today we’ll see : • Overlap alignment • Affine cost for gaps We’ll use ideas of dynamic programming presented in the lecture

  7. Overlap Alignment Consider the following problem: • Find the most significant overlap between two sequences S,T ? • Possible overlap relations: a. b. Difference from local alignment: Here we require alignment between the endpoints of the two sequences.

  8. Overlap Alignment Formally: given S[1..n] , T[1..m] find i,j such that: d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) } is maximal. Solution: Same asGlobal alignment except we don’t not penalise overhanging ends.

  9. Overlap Alignment • Initialization:V[i,0]=0,V[0,j]=0 Recurrence:as in global alignment Score:maximum value at the bottom line and rightmost line

  10. Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme : • Match: +4 • Mismatch: -1 • Indel: -5

  11. Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme : • Match: +4 • Mismatch: -1 • Indel: -5

  12. Overlap Alignment (Example) S =PAWHEAE T =HEAGAWGHEE Scoring scheme: • Match: +4 • Mismatch: -1 • Indel: -5

  13. Scoring scheme : • Match: +4 • Mismatch: -1 • Indel: -5 -2 Overlap Alignment (Example) The best overlap is: PAWHEAE------ ---HEAGAWGHEE Pay attention! A different scoring scheme could yield a different result, such as: ---PAW-HEAE HEAGAWGHEE-

  14. Affine gap scores • Observation: Insertions and deletions often occur in blocks longer than a single nucleotide. • Consequence: • Current scoring scheme gives a constant penalty per gap unit. • This does not score well the above phenomenon. Question: How do we modify the scheme to incorporate this?

  15. Alignment with affine gap scores • Penalty score for a gap of length g : d - penalty for introduction of a gap e - penalty for elongating the gap by one unit. Typically d > e • Problem: When aligning S[i] to a gap we do not know whether to penalize by d or e. Solution: we compute 3 matrices simultaneously M(i,j) - the score obtained by aligning S[i] to T[j] IS(i,j) - the score obtained by aligning S[i]to a gap IT(i,j) - the score obtained by aligning T[j]to a gap

  16. We assume that a deletion will not be followed directly by an insertion. This can be obtained by using Affine gap scores • Initialization:depending on the problem (global, local,…) • Recurrence:uses already known values - M(i’,j’), IS(i’,j’), IT(i’,j’)

  17. Affine gap scores • Simplification: Why are two matrices enough?

More Related