1 / 36

生物資訊演算法

生物資訊演算法. Final presentation -. A generalized global alignment algorithm. 資訊三 B90902105 高孟駿 電機四 B89901143 李庭諭. Outline. Introduction Alignment model Algorithm ( including correctness proof ) Complexity Analysis Discussion. Motivation.

chiara
Download Presentation

生物資訊演算法

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 生物資訊演算法 Final presentation - A generalized global alignment algorithm 資訊三 B90902105 高孟駿電機四 B89901143 李庭諭

  2. Outline Introduction Alignment model Algorithm ( including correctness proof ) Complexity Analysis Discussion

  3. Motivation want to find local alignment separated by difference blocks For example,homologous sequence may have a much lower global similarity if the different regions are much longer than the similar region

  4. Example: GTAGT CATCAT ATG TGACTGAC G TC CATDOGCAT CC TGACTGAC A We will find this segment using local alignment GTAGT CATCAT ATGCC TGACTGAC G difference block difference block differenceblock TC CATDOGCAT CCTACTAC TGACTGAC A

  5. Aligning two strings A =GTCATCATATGTGACTGACG B = TCCATDOGCATTAACTAACA Gap A =GT++CAT---CATATGTGACTGACG+ ||| ||| | ||| || B = ++TCCATDOGCAT+++TAACTAAC+A Match Difference

  6. How good an alignment is Let σ(a,b) be the score of aligning residues a and b, i.e. the scoring matrix Scoring matrix

  7. How good an alignment is (2) • In addition, • A difference block gets score – d, where d is a non-negative constant • A consecutive gap of length k gets – ( q + k * r ) points, where q, r are gap-open and gap-extension constant

  8. For example: ( d, q, r ) = ( 5, 2, 1 ) -2-3*1 A =GT++CAT---CATATGTGACTGACG+||| ||| | ||| || B = ++TCCATDOGCAT+++TAACTAAC+A -5 2+2+2 +10 Total:+2

  9. Generalized global alignment Problem • Input • two strings A and B; and • a set of scoring parameters • Output • An alignment of A and B with the max-score • Challenge • Time = O ( |A| x |B| ) • Space = O ( |A| )

  10. We’re not naïve enough Q: Any naïve methods? • A = GTAGT CATCAT ATG TGAC G B = TC CATDOGCAT CC TGAC A • All possible combinations? it’s difficult for us to figure out a naïve algorithm

  11. Observations Any alignment must end with either a match,mismatch, gap,or difference block.

  12. We can solve this problem using standard Dynamic programming method Some ideas Translating this problem into a graph max-path problem

  13. Original alignment problem(without difference block) c c t t a g t c • Each alignment corresponds to amaximal pathon the alignment graph. • The score of an alignment is the score of its corresponding maximal path. a t t g a c c t t - a g t c a - t t g a - - -

  14. What if with difference block? GTAGT ATGCC • Not simply a path on 2-dimensional matrix. • we shall approach it with several tables.

  15. Definitions Let A [1…N] and B [1…M] be two input strings,Ai, Bj denote the string A[1…I] and B[1…j] D(i,j) : the maximum score aligning Ai and Bjended with a deletion I(i,j) : the maximum score aligning Ai and Bjended with an insertion H(i,j) : the maximum score aligning Ai and Bjended with a difference block S(i,j) : the maximum score aligning Ai and Bj

  16. An Observation of S S(i-1,j-1)+σ(ai,bj) H(i,j) D(i,j) • S(i,j)= max of - S(i-1,j-1) +σ(ai,bj) - I(i,j) - D(i,j) - H(i,j) • How about I(i,j), D(i,j), and H(i,j)? I(i,j) Max S(i,j)

  17. How to calculate S, I, D, and H • S(i,j)= maximum of - S(i-1,j-1)+σ(Ai,Bj) - I(i,j), D(i,j) - H(i,j) • D(i,j)= maximum of - D(i-1,j)-r - S(i-1,j)-q-r • H(i,j)= maximum of - H(i,j-1) - H(i-1,j) - S(i,j-1)-d - S(i-1,j)-d • I(i,j)= maximum of - I(i,j-1)-r - S(i,j-1)-q-r

  18. 0, for i=j=0 • S(i,j)= max{D(i,0),H(i,0)}, for i>0,j=0 max{I(0,j),H(0,j)}, for i=0,j>0 • D(i,j)= S(0,j)-q, for i=0,j>=0 D(i-1,0)-r, for i>0,j=0 • I(i,j)= S(i,0)-q, for i>=0,j=0 I(0,j-1)-r, for i=0,j>0 • H(i,j)= -d, for i=0 or j=0 The Algorithm • Initialize S, I, D, H with the following value

  19. The Algorithm (2) • for i = 1 to n-1 do { • for j = 1 to m-1 do { • compute D(i,j), I(i,j), H(i,j) with the rule that has been given; • compute S(i,j); • } • } • Report S(n-1,m-1) as the disired max-score. • Report the alignment by back tracing (回顧來時徑)

  20. Justification Key idea:Any alignment must end with either a match, mismatch, gap,or difference block. • S(i,j)= maximum of - S(i-1,j-1)+σ(Ai,Bj) - I(i,j), D(i,j) - H(i,j) The correctness of S(i-1,j-1),D(i,j),and H(i,j) implies the correctness of S(i,j) In other words, if S(i,j) is not max, then one of S(i-1,j-1), D(i,j) and H(i,j) is not max.

  21. S,H(i-1,j)S,H(i,j-1) S(i-1,j-1)D,I(i,j) H(i,j) H(i,j) S(i,j) S(i,j-1)I(i,j-1) S(i-1,j)D(i-1,j) I(i,j) D(i,j) Justification (2) The argument for I(i,j),D(i,j),H(i,j) are anagolous.

  22. Justification (3) If S(i,j) is not max,then one of its predecessors,say D(i,j),is not max.Then one predecessor of D(i,j),say S(i-1,j), is not max. … … … Finally we have one entry on the boundary of S,I,D,H is not max. So we have reduced the correctness of S(i,j) to the correctness of the initial(boundary) value of S,I,D,H.

  23. Wahaha…… Justification (4) Moreover, the correctness of the initial value of S,I,D,H can reduce to the correctness of S(0,0) and H(i,j) for i=0 or j=0 (Why??),which are certainly max. So far we have proved the correctness of S(i,j).

  24. Estimated T: GTAGT ATGCC T’: CCTACTAC local alignment# t difference block differenceblock difference block TC CCTACTAC CCTACTAC T : optimalT’: replacing #t with a difference block. Then we have score(T)=score(T’)+score(#t)-d

  25. Estimated • Because T is optimal, score(T)>=score(T’) • score(#t)-d>=0 and score(#t)>=d So we conclude that the parameter d should be less than or equal to the minimum score of all desirable local alignment.

  26. Complexity Q: Is it good enough? • Space = O ( |A| x |B| ) - Four table of size |A||B| • Time = O ( |A| x |B| ) - Each entry of table requires O(1) computation and traversal time

  27. Even more improvement Challenge - Reducing the space complexity How?

  28. 2 column method • Keep the first 2 coloums of each table • Time: O(|A|x|B|) still • Space: 8*|A| = O(|A|) • What’s the problem?Back tracing

  29. 1 i GTAGT ATGCC TC CCTACTAC |B|/2 • How to extract an optimal alignment from an optimal score? • Let R(A,B) be the optimal alignment of A and B • Idea: find i such that R(Ai ,B|B|/2)+Rr(Ar|A|-i,Br|B|/2)=R(A,B) i.e. i is the separation point ( i 待定 ) 1 2 |B|/2 |B| 1 2 i |A|/2 |A|

  30. Then we have the following cases: • Both R(Ai,B|B|/2) and Rr(Ar|A|-i,Br|B|/2) end with difference blocksscore(A,B)=score (Ai,B|B|/2 )+score (Ar|A|-i,Br|B|/2) +d • Both R(Ai,B|B|/2) and Rr(Ar|A|-i,Br|B|/2) end with gapsscore (A,B)=score (Ai,B|B|/2 )+score (Ar|A|-i,Br|B|/2) +q • Otherwisescore (A,B)=score (Ai,B|B|/2 )+score (Ar|A|-i,Br|B|/2)

  31. 1 2 |B|/2 |B| 1 2 i |A|/2 |A| • After finding i …… • Extract optimal alignments of R(Ai ,B|B|/2), Rr(Ar|A|-i,Br|B|/2) recursively • Time = i*|B|/2+(|A|-i)*|B|/2 = |A|*|B|/2

  32. Overall complexity • Time = O ( |A| x |B| ) - |A||B| * ( 1+ ½ + ¼ +… ) = O( |A| x |B| ) • Space = O ( |A| )

  33. Discussion • Unexpected feature • isolated matching in the beginning or end • Modification • extra penalty of d for not end or begin with a difference block • d must be estimated by the method illustrated previously

  34. Reference • A generalized global alignment algorithm, by Xiaoqiu Huang and Kun-Mao Chao,Bioinformatics, 19: 228-233, 2003. • Algorithms in Bioinfomatics 2003, Lecture slides 7, November 4, 2003, by Prof. Hsueh-I Lu • Introduction to algorithms,

  35. Thanks for your attention Any Question? The End.

More Related