270 likes | 401 Views
Final presentation Tandem Cyclic Alignment. Sequence Alignment. Needleman-Wunch Algorithm – global alignment, fixed gap penalty Waterman-Smith-Beyer Algorithm – local alignment, affine gap penalty function Gotoh ’ s algorithm – local alignment, affine gap penalty function.
E N D
Sequence Alignment • Needleman-Wunch Algorithm – global alignment, fixed gap penalty • Waterman-Smith-Beyer Algorithm– local alignment, affine gap penalty function • Gotoh’s algorithm – local alignment, affine gap penalty function
Goth’s Algorithm – (Local Alignment) Consider the gapless sequences a and b. Let g(k) = a + kb be an affine gap penalty function and let w(ai,bj) be a cost function. D is the distance matrix. P is the matrix with the minimal distances for all alignments with bo ending in a gap. Q is the matrix with the minimal distances for all alignments with ao ending in a gap.
Gotoh’s Algorithm • Uses dynamic programming with three matrices (instead of 1). • Traceback – need to track movement through all three matrices.
Tandem Repeats • Tandem repeats are a special class of repeats with very short repeat units. Each repeat unit is frequently of a few nucleotides long. • For example, one tandem repeat in human comprises of hundreds of copies of a 6-nucleotide repeat TTAGGG. These are often called microsatellites. • In eukaryotic genomes, repeats with longer repeating units of up to 25 nucleotides (called minisatellites) are also abundant. They are located mostly in non-transcribed regions.
Finding Tandem Repeats • A straightforward approach to look for tandem repeats with repeat unit of length k is to look for consecutive exact occurrences of a pattern of length k. This can be accomplished efficiently. • However, it is often the case that some of the repeat units are mutated. We will need to allow for mismatches when looking for these imperfect repeats. • It becomes much more difficult to obtain an efficient algorithm as the number of mismatches allowed increases.
Finding Tandem Repeats by Alignment • If the dominating repeating pattern is known, another way to locate imperfect repeats is by solving the following alignment problem: • Let p be a pattern of length m (repeat unit) and s be a sequence of length n (search string). Let pn be the concatenation of p with itself n times. Finding an imperfect tandem repeat is equivalent to finding an optimal local alignment between pn and s. p p p … s
Local alignment S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P P
Wraparound Method O(mn) • When aligning a sequence with tandem repeats, use the ‘wrap around’ method to minimize calculations. • When implementing the wrap around method, look at the section with tandem repeats separately. • Write the repeated sequence only once in the similarity matrix. • Align as usual except when reaching the end of the repeated sequence, use that value as the first value in the next row and repeat this procedure.
Wraparound Algorithm • When developing a dynamic programming implementation for the wraparound algorithm, there is a problem with determining the Q matrix. • In order to define Qi,1, it is necessary to know Qi,|b|. • Hence, there must be two passes to correctly detemine Q
Cyclic global alignment O(n2m) • Given sequences X and Y • Find the best scoring alignment of X [i] vs Y over all possible i, • 1<=i<=|X|,where all of Y and exactly one whole (cyclically permuted) copy of X must occur in the alignment. X Y
O(nmlog n). X X X X X X X Y C-1 C C+1