220 likes | 323 Views
Local Alignment of RNA Sequences with Arbitrary Scoring Schemes. Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann. C G. C G. G C. C. A U. U A. C G. A. G. U. A. G. U. C. G. A. C. G. U. G. U. C. A. A. A. C. G. U. U. G. G. C. RNA sequences. RNA sequences.
E N D
Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Rolf Backofen Danny Hermelin Gad M. LandauOren Weimann
C G C G G C C A U U A C G A G U A G U C G A C G U G U C A A A C G U U G G C RNA sequences
RNA sequences C G C G G C C A U U A C G A G U A G U C G A C G U G U C A A A C G U U G G C
RNA sequences C G C G A U G C C U A C G A G U A G U C G A C G U G U C A A A C G U U G G C
Alignment of Strings S1 = U C A C C G __ A __ G S2 = U C G C G G U A U G Global Alignment:
Alignment of RNA sequences A AG GC C CUG AU A U AG AC CGUU
Alignment of RNA sequences A A G GC C C U G AU U A G A C C G UU
Alignment of RNA sequences A A G GC C C U G AU U A G A C C G UU RNA Global Alignment via tree edit distance: [SZ 1989] Theorem: All these algorithms compute the edit distance between any two arcs provided we match these arcs. [K 1998] n [DMRW 2006] m
The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.
The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.
The Alignment graph U C A C C G A G U C G C G G U A U G
The Alignment graph U C A C C G A G U C G C G G U A U G
The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2 in which all arcs are deleted.
The Alignment graph U C A C C G A G U C G C G G U A U G
The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between HEAVIEST paths in the alignment graph and OPTIMAL alignments of substrings of R1 and R2.
The Local Alignment algorithms • We use the alignment graph to compute the local similarity between two RNA sequences according to two well known metrics: • Smith-Waterman – the highest scoring alignment between any pair of substrings of the input RNAs. • It’s normalized version.
Standard Local Similarity (Smith-Waterman) U C A C C G A G • The score is computed via dynamic program: Score(i,j) = max U C G C G G U A U G Score(i’,j’) + Weight of the incoming edge from (i’,j’), 0 Time complexity: O(mn) + one run of a global algorithm = n m
Normalized Local Similarity • The weakness of Smith Waterman approach [AP 2001]: • Solution: look for the substrings (with their arcs) that maximize: and some given value.
Normalized Local Similarity U C A C C G A G U • Again, dynamic program: C G Define Length(k,i,j) to be the length of the shortest path that ends at vertex (i,j) and has weight equal to k. C G G U • The best k/Length(k,i,j) over all i,j,k is the normalized score. A U G
w j’-j i’-i Normalized Local Similarity • Again, dynamic program: Length(k-w,i’,j’) Define Length(k,i,j) to be the length of the shortest path that ends at vertex (i,j) and has weight equal to k. For every k,i,j compute Length(k,i,j) = min Length(k,i,j) Length(k-w,i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’) Time complexity: + one run of a global algorithm = n m
Open Problems U C A C C G A G • Arc deletion: • Improve global tree edit distance U C G C G G U A U G
Muchas Gracias por la atencion