310 likes | 520 Views
Algorithms for Two Versions of LCS. Problem. for Indeterminate Strings. Goal of this paper͙. • Study the classic LCS and the Constrained LCS (CLCS) problems for Indeterminate strings. • Present efficient algorithms to solve them. 5-9 Nov 2007. IWOCA 2007. 2. Longest Common Subsequence.
E N D
Algorithms for Two Versions of LCS Problem for Indeterminate Strings
Goal of this paper͙ • Study the classic LCS and the Constrained LCS(CLCS) problems for Indeterminate strings • Present efficient algorithms to solve them 5-9 Nov 2007 IWOCA 2007 2
Longest Common Subsequence • Given two sequences: - X = CAAGCTAAGCTAC - Y = TCAAGTAGAAC • Common Subsequence: A Subseq common toboth X and Y. • LCS- A subseq having the highest length 5-9 Nov 2007 IWOCA 2007 3
LCS-Example 1 2 3 4 5 6 7 8 9 10 11 X= C A A G C T A A G C T A common subseq: CCT Length = 3 Y= C C G T A T 1 2 3 4 5 6 5-9 Nov 2007 IWOCA 2007 4
LCS-Example 1 2 3 4 5 6 7 8 9 10 11 12 X= C A A G C T A A G C G T Y= C C G T A T A Longest common subseq: CCTAT Length = 5 1 2 3 4 5 6 5-9 Nov 2007 IWOCA 2007 5
LCS-Example 1 2 3 4 5 6 7 8 9 10 11 12 X= C A A G C T A A G C G T Y= C C G T A T A Longest common subseq: CCTAT Length = 5 Another LCS: CGTAT 1 2 3 4 5 6 Length = 5 5-9 Nov 2007 IWOCA 2007 6
CLCS: A relatively New Variant 1 2 3 4 5 6 1 2 3 4 5 6 X= X= T C C A C A T C C A C A Y= Y= A C C A A G A C C A A G Z= A C Z= A C 5-9 Nov 2007 IWOCA 2007 7
Different Setting͙ • We study LCS and CLCS for indeterminatestrings (i-strings) • We call the two problems ILCS and CILCSrespectively 5-9 Nov 2007 IWOCA 2007 8
i-strings͙ • Let Σ= {A, C, G, T} • Then we can get 2^4 -1 = 15 non-empty setsof letters. • At each position of an i-string we have one ofthose sets. 5-9 Nov 2007 IWOCA 2007 9
i-strings Σ A C G T A C G A C T A G T C G T C G A C A G A T C T C G A C G T 5-9 Nov 2007 IWOCA 2007 10
i-strings 1 2 3 4 5 6 7 T T X= A C C A C C C A 5-9 Nov 2007 IWOCA 2007 11
i-strings: Equality/Match 1 2 3 4 5 6 7 X[3] = Y[1]. WHY? T Because, X[3] пY[1] = A ≠ Ø T X= A C C A C C Y = X[1..3] C Y = X[3..5]Y = X[4..6] A C TA C Y= A T T A C C C A C A A Interestingly, X[1..3] ≠ X[3..5]!!! X[1..3] X[3..5] 5-9 Nov 2007 IWOCA 2007 12
i-strings: Equality/Match 1 2 3 4 5 6 7 T T X[3] =d Y[1]. WHY? X= A C C A C C C Because, , X[3] п Y[1] = A ≠ Ø A Y =d X[1..3] C TA C Y =d X[3..5]Y =d X[4..6] Y= A 5-9 Nov 2007 IWOCA 2007 13
ILCS 1 2 3 4 5 6 7 A X= B D D A A AF A C D Y= B A A A C D F 5-9 Nov 2007 IWOCA 2007 14
CILCS 1 2 3 4 5 6 7 A X= B D D A A AF A C D Y= B A A A C D F B D D Z= 5-9 Nov 2007 IWOCA 2007 15
CILCS 1 2 3 4 5 6 7 A X= B D D A A AF A C D Y= B A A A C D F B D D Y= 5-9 Nov 2007 IWOCA 2007 16
Motivation͙ • Motivations for LCS and CLCS are well-known. • But, why indeterminate strings? • Indeterminate strings are ubiquitous inbiological motifs • And, both LCS and CLCS gets motivation frombioinformatics 5-9 Nov 2007 IWOCA 2007 17
Naive Algorithms • Using the existing LCS and CLCS algorithms wecan solve ILCS and CILCS easily. 5-9 Nov 2007 IWOCA 2007 18
Naive ICLS Algorithm • We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS: 5-9 Nov 2007 IWOCA 2007 19
Naive ICLS Algorithm • We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS: =d 5-9 Nov 2007 IWOCA 2007 20
Naive ILCS Algorithm… • We assume a sorted order among the lettersin the sets of the i-strings • Then, intersection can be done in O(|Σ|)time. • So total running time O(|Σ|n^2) 5-9 Nov 2007 IWOCA 2007 21
Our Goal • Our goal is to get a better running time thanO(|Σ|n^2). 5-9 Nov 2007 IWOCA 2007 22
Our Strategy • We want to facilitate an O(1) time evaluationfor =d i.e. indeterminate equality • To achieve that we do some preprocessing onthe input i-strings • Then we employ existing LCS algorithms 5-9 Nov 2007 IWOCA 2007 23
Preprocessing 1 for ILCS • We compute the following table: • With the above table, the indeterminate equality can evaluated in O(1). 5-9 Nov 2007 IWOCA 2007 24
Computation of Table Σ ≡ A C G T 1 2 3 4 1 0 1 1 1 T 0 0 1 0 2 X= A G C A 0 1 0 0 3 A 0 0 1 0 4 1 0 1 0 1 A C T 0 1 1 0 2 Y= C T A G 0 0 0 1 3 1 0 0 1 4 5-9 Nov 2007 IWOCA 2007 25
Computation of Table 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 5-9 Nov 2007 IWOCA 2007 27
Complete Algorithm • With Table I, we can evaluate =d in O(1). • So, the DP requires O(n^2)! • But how much to compute Table I? 5-9 Nov 2007 IWOCA 2007 29