360 likes | 447 Views
Simple and fast linear space computation of L ongest c ommon s ubsequences. Claus Rick, 1999. A. What is the LCS problem?. A A B A C. A B C. …Finding a sequence of greatest possible length that can be obtained
E N D
Simple and fast linear space computation of Longest common subsequences Claus Rick, 1999
A What is the LCS problem? A A B A C A B C …Finding a sequence of greatest possible length that can be obtained From both A and B by deleting zero or more (not necessarily adjacent) symbols.
A Some boring history…
A Pre-Info • Divide and conquer • Midpoint
A Some basic terms Ordered Pair (i,j) A A B A C A B C (2,3)= (A,C)
A Some basic terms Match A A B A C A B C
A Some basic terms Chain A A B A C A B C
A Rank k A A B A C A B C
A Some basic terms c b a b b a c a c Matching Matrix a b a c b c b a
A Some basic terms Dominant matches All Upper-left matches in each rank
A Dominant matches c b a b b a c a c a b a c b c b a 1 2 3 4 5
A A A B A C A B C
A c b a b b a c a c a b a c b c b a
A Backward contours (BC) a b a c b c b a 5 4 3 2 1 c b a b b a c a c
A Some last basic terms FCk BCk
A Forward contours (FC) c b a b b a c a c a b a c b c b a 1 2 3 4 5
A Backward contours (BC) a b a c b c b a 5 4 3 2 1 c b a b b a c a c
A Lemma 1 Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds: • There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.
A Lemma 1- proof P |BC|- (p-k+1) |FC|= (k) K <(p-k+1) <(p-k+1) P
A Start calculating FC1 BC1 FC2 BC2 Sooner or later…
A Really really last terms Define sets Mi as: M0= M M1= M0\FC1 M2= M1\BC1 M2i-1=M2(i-1) \FCi M2i=M2i-1\BCi
A c b a b b a c a c a b a c b c b a a b a c b c b a M c b a b b a c a c
A c b a b b a c a c a b a c b c b a a b a c b c b a M1 M2 M3 M4 M5 c b a b b a c a c
A Let call the first empty Mi…. M p’
A Lemma 2 • The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint
A Lemma 2- proof K K-1 K-2 1 0 K=p M k-1 M 0 M 2 M 1 M k
A Little problem… • We can`t keep tracks of each set- very expensive
A c b a b b a c a c a b a c b c b a a b a c b c b a c b a b b a c a c
A What do we do? Keep only dominant matches… When we see a dominant match below- done.
A c b a b b a c a c a b a c b c b a a b a c b c b a c b a b b a c a c
A Lets define: • FCf’ , BCb’ the minimal indices as stated above
A Lemma 3 • The Length of an LCS is b’ + f’ -1.
A Complexity Finding the dominant matches each contour: O(min(m, (n-p)) Number of contours: P O(Min(pm, p(n-p)
A The End
Simple and fast linear space computation of longest common subsequence Written by: Claus Rick,1999 Based on algorithm by: D.Hirschberg, 1975 Cast: Matrices Lines Arrows Squares Blue Red Brown Grey Black String A String B Presentation: Uri Scheiner No Dominant Matches were harmed during the making of this presentation
Appendix What is the LCS Lemma 1 Divided And Conquer Define M… Match Lemma 2 Chain Keep just Dominant… Dominant Matches FC Lemma 3 BC Complexity