260 likes | 471 Views
Global Sequence Alignment. Fall 2003 Dr. Susan Bridges. Dynamic Programming Algorithm. Algorithm for finding optimal alignment in O(n 2 ) time and space. Based on an algorithm for text editing Uses alignment of prefixes to build up a global alignment Requires a method for scoring alignments.
E N D
Global Sequence Alignment Fall 2003 Dr. Susan Bridges Department of Computer Science and Engineering Bioinformatics
Dynamic Programming Algorithm • Algorithm for finding optimal alignment in O(n2) time and space. • Based on an algorithm for text editing • Uses alignment of prefixes to build up a global alignment • Requires a method for scoring alignments Department of Computer Science and Engineering Bioinformatics
Example* • Align the following sequences: AAAC AGG • Use the following for scoring: • Match +1 • Mismatch -1 • Insert space -2 *Adapted from: Setubal, J. and J. Meidanis, Introduction to Computational Molecular Biology, 1997, Brooks-Cole Department of Computer Science and Engineering Bioinformatics
Sequences to align AAAC AGG The stars are used to indicate the smallest prefix of each string. The cost of aligning an empty string with an empty string is 0 as indicated by the entry in a[0,0]
Sequences to align AAAC AGG Now we can fill in the first row and first column. Entry a[0,1] should contain the cost of aligning A with an empty string. That will require an insertion of a space and the cost is -2. _ A
Sequences to align AAAC AGG We need to keep track of how we get the values in the entry. In this case, we added an A and a space to two empty string. The empty strings had been aligned with a cost of 0 (a[0,0]) and we add (-2) to this value to get the -2 entry in a[0,1]. We use arrows to show which previous value was used.
Sequences to align AAAC AGG Next we need to align AG with the empty string. This is done by adding another space to the first string _ _ A G The cost is -2 for the new space + (-2) from the previous alignment for a total of -4
Sequences to align AAAC AGG The last entry in the row is filled in a similar fashion. The alignment _ _ A G Is extended by a space and another G to get _ _ _ A G G At a cost of -6
Sequences to align AAAC AGG We can now fill in the first column in the same manner. For entry a[1,0], we are trying to align the empty sequence with an A and align it with the empty sequence. This is done by adding a space to the second sequence A _
Sequences to align AAAC AGG The remainder of the column is completed in a similar manner
For the entry a[1,1] we need to find the best way to extend current alignments in order to take care of the two A’s. We have three choices. • Use the information in the cell directly above and add an A to the first string and a space to the second string. • _ A score = -2 + -2 • A _ • Use the information in the cell directly to the left: add a space to the first string and an A to the second string. • A _ score = -2 + -2 • _ A • Use the information up and to the left and add an A to both empty strings. • A score = 0 + 1 • A
We select the highest score, record this in the cell, and point back to the cell that was used in the computation. max(-2, -2, 1) = 1 1
General procedure j-1 j i-1 i Department of Computer Science and Engineering Bioinformatics
1 -1 Let’s just apply the formula this time. a[1,2] = max(-4 + -2, -2 -1, 1-2) = max(-6, -3, -1) The maximum value is -1.
1 -1 Let’s just apply the formula this time. a[1,2] = max(-4 + -2, -2 -1, -1-2) = max(-6, -3, -3) The maximum value is -3.
1 -3 -1 Let’s just apply the formula this time. a[1,3] = max(-6 + -2, -4 -1, 1-2) = max(-8, -5, -1) = -1 The maximum value is -1.
-1 a[2,1] = max(1 + -2, -2 +1, -4-2) = max(-1, -1, -6) = -1 The maximum value is -1. But where should we draw the arrow? See next slide.
1st 2nd 3rd 1 -1 -3 -1 By convention, the precedence of directions is as shown above. Since the first -1 came from the cell above, that is where we draw the arrow.
1 -1 -3 -1 See if you can complete filling out the table before you go on to the next slide.
Now how do we find the actual alignment? We follow the arrows. • 1. Following an arrow up means take the symbol from the row for the first sequence and a space for the second sequence • Following an arrow on the diagonal means take the letter from the row for the first sequence and the letter from the column for the second sequence. • Following an arrow left means put a space in the first sequence and use the letter from the column for the second. We know that the score for the final alignment is -3.
Start from the bottom left and build the alignment from right to left C _ A C G _ A A C G G _ A A A C A G G _