1 / 13

Sequence Alignment II

Sequence Alignment II. CIS 667 Spring 2004. Optimal Alignments. So we know how to compute the similarity between two sequences How do we construct an alignment that gives that similarity? We will use the (already computed) array from the previous algorithm

portia
Download Presentation

Sequence Alignment II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Alignment II CIS 667 Spring 2004

  2. Optimal Alignments • So we know how to compute the similarity between two sequences • How do we construct an alignment that gives that similarity? • We will use the (already computed) array from the previous algorithm • Start at entry (m, n) and repeat the choices made to get the similarity score • Note that sometimes we had more than one choice giving the same optimal score

  3. Optimal Alignments • Each choice gives one column of the alignment • If we have two or three choices, we systematically choose one of them • We will use a recursive algorithm • The algorithm will produce two arrays - align-s and align-t • The elements of these arrays are either spaces or symbols from the sequences

  4. Algorithm Align input: indices i, j, array a given by algorithm Similarity output: alignment in align-s, align-t, and length in len if i = 0 and j = 0 then len  0 else if i > 0 and a[i, j] = a[i - 1, j] + g then Align(i - 1, j, len) len  len + 1 align-s[len]  s[i] align-t[len]  - else if i>0 and j>0 and a[i,j] = a[i-1,j-1] + p(i,j) then Align(i - 1, j - 1, len) len  len + 1 align-s[len]  s[i] align-t[len]  t[j] else // j > 0 anda[i, j] = a[i, j - 1] + g Align(i, j - 1, len) len  len + 1 align-s[len]  - align-t[len]  t[j]

  5. Algorithm Complexity • First algorithm has four loops • O(m), O(n), O(mn) • So complexity is: O(m) + O(n) + O(mn) = O(mn) = O(n2) • Second algorithm is • O(len) = O(m + n)

  6. Local Comparison • A local alignment between s and t is an alignment between a substring of s and a substring of t • We want to find the highest scoring local alignment between two sequences • Modify the original algorithm so that each entry (i, j) of the matrix will hold the highest score of an alignment between a suffix of s[1..i] and a suffix of t[1..j]

  7. Local Comparison • First row and column initialized to 0 • We now fill in the other elements of a as before, choosing the maximum of, now, 4 values • We have the previous three choices, plus a fourth choice - 0 • We always have the choice zero, by aligning the two empty suffixes • Find the alignment same way as before, but stop if we reach an entry with value zero • Start search at the largest value in the array

  8. Local Alignment with match: +1, mismatch -1, gap 0

  9. Semiglobal Comparisons • The basic algorithm compares two sequences in their entirety • Gap penalty assessed whether in middle or at end of one or more sequences • Not always desirable • Suppose we want to search for the short sequence ACGT within the longer sequence AAACACGTGTCC AAACACGTGTCC ----ACGT----

  10. Semiglobal Comparisons • We don’t want to penalize the gaps at the end as we do those in middle since they don’t have biological significance • Usually result from incomplete data acquisition • This approach is known as semiglobal alignment • We can modify the basic algorithm for this type of alignment

  11. Semiglobal Comparisons • Suppose we don’t want to charge for spaces after the last character of s • Consider an optimal alignment • Spaces after the end of s are matched with a suffix of t • Removing final part of alignment, we have an alignment between s and a prefix of t • So find optimal alignment between s and a prefix of t - but these are already computed in last row of a! • So take max value from last row of a

  12. Semiglobal Comparisons • Suppose we don’t want to charge for spaces after the last character of t • Consider an optimal alignment • Spaces after the end of t are matched with a suffix of s • Removing final part of alignment, we have an alignment between t and a prefix of s • So find optimal alignment between t and a prefix of s - but these are already computed in last column of a! • So take max value from last column of a

  13. Semiglobal Comparisons • What about spaces at the beginning of s and t? • These are represented by the values in the first row and column of a • So, if we don’t want to charge for them, just initialize this row and column to be all 0 • So the changes to the basic algorithm are: • Initialize row 1, column 1 to zero • Look for maximum in last row or column

More Related