400 likes | 611 Views
Approximate On-line Palindrome Recognition, and Applications. Amihood Amir Benny Porat. Moskva River. Confluence of 4 Streams. Approximate Matching. Palindrome Recognition. CPM 2014. Online Algorithms. Interchange Matching. Palindrome Recognition.
E N D
Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat
Confluence of 4 Streams Approximate Matching Palindrome Recognition CPM 2014 Online Algorithms Interchange Matching
Palindrome Recognition - Voz'mi-ka slovo ropot, - govoril Cincinnatu ego shurin, ostriak, -- I prochti obratno. A? Smeshno poluchaetsia? Vladimir Nabokov, Invitation to a Beheading (1) "Take the word ropot[murmur]," Cincinnatus' brother-in-law, the wit, was saying to him, "and read it backwards. Eh? Comes out funny, doesn't it?" [--› topor: the axe] A palindrome is a string that is the same whether read from right to left or from left to right: Examples:доход A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!
Palindrome Example Ibn Ezra: Medieval Jewish philosopher, poet, Biblical commentator, and mathematician. Was asked:"אבי אל חי שמך למה מלך משיח לא יבא" [ My Father, the Living God, why does the king messiah not arrive?] His response: "דעו מאביכם כי לא בוש אבוש, שוב אשוב אליכם כי בא מועד" [ Know you from your Father that I will not be delayed. I will return to you when the time will come ]
Palindromes in Computer Science Great programming exercise in CS 101. Example of a problem that can be solved by a RAM in linear time, but not by a 1-tape Turing machine. (Can be done in linear time by a 2-tape TM)
Palindrome Concatenation We may be interested in finding out whether a string is a concatenation of palindromes of length > 1. Example:ABCCBABBCCBCAACB Why would we be interested in such a funny problem? – we’ll soon see Exercise: Do this in linear time… ABCCBABBCCBCAACB
Stream 2 - Approximations As in exact matching, there may be errors. Find the minimum number of errors that, if fixed, will give a string that is a concatenation of palindromes of length > 1 Example:ABCCBCBBCCBCABCB For Hamming distance: A-Porat [ISAAC 13]: Algorithm of time O(n2) ABCCBABBCCBCAACB
Stream 3 - Reversals Why is this funny problem interesting? Sorting by reversals: In the evolutionary process a substring may “detach” and “reconnect” in reverse: ABCA BCDAABC BAD ABCA BCDAABC CBAADCB BAD
Sorting by Reversals What is the minimum number of reversals that, when applied to string A, result in string B? History: Introduced: Bafna & Pevzner [95] NP-hard: Carpara [97] Approximations: Christie [98] Berman, Hannenhalli, Karpinski [02] Hartman [03]
Sorting by Reversals – Polynomial time Relaxations • Signed reversals: Hannenhalli & Pevzner [99] • Kaplan, Shamir, Tarjan [00] • Tannier & Sagot [04] • . . . • Disjointness:Swap Matching Muthu [96] • Two constraints: • The length of the reversed substring is limited to 2. • All swaps are disjoint.
Pattern Matching with Disjoint Reversals S1: S2: RD(S1,S2) = 2 • Reversal Distance (RD): • The RD between s1 and s2 is the minimum number k, such that there exist s2’ , where HAM(s1,s2’) =k, and s1 reversal match s2.
Connection between Reversal Matching and Palindrome Matching S1: S2: A C D D C A B A A B E A D B B D A E Interleave Strings:
On-line Input Suppose that we get the input a byte at a time: For the palindrome problem: A C D D C A B A A B E A D B B D A E A A A
On-line Input Suppose that we get the input a byte at a time: For the reversal problem: AC DD CA BA AB EA DB BD AE A A A
Main Idea – Palindrome Fingerprint The Rabin Karp Fingerprint Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p) s0,s1,s2,…sm-1 ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p) The Reversal Fingerprint If rm+1ΦR(S) = Φ(S) => S is a palindrome. w.h.p.
Palindrome Fingerprint If rm+1ΦR(S) = Φ(S) => S is a palindrome. Example: S = A B C B A r6ΦR(S)= r6 (1/r A + 1/r2 B + 1/r3 C + 1/r4 B + 1/r5 A) = r5 A + r4 B + r3 C + r2 B + r A = Φ(S) Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p) ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)
Simple Online Algorithm for Finding a Palindrome in a Text t1,t2,t3, … ti,ti+1,ti+2 ,…ti+m, ti+m+1 , … tn Φ=r1ti+ r2ti+1+… rmti+m mod (p) Ifrm+1ΦR =Φ=> there is a palindrome starting in the i-th position. ΦR=r-1ti+ r-2ti+1+… r-mti+m mod (p) If not, then for the next position: Note: This algorithm finds online whether the prefix of a text is a permutation. For finding online whether the text is a concatenation of permutations, assume even-length permutations, otherwise, every text is a concatenation of length-1 permutations. Φ=Φ+ rm+1ti+m+1 mod (p) ΦR=ΦR + r-(m+1)ti+m+1 mod (p)
Palindrome with mismatches Start with 1 mismatch case.
1-Mismatch S= s0,s1,s2, … sm-1 Choosel prime numbers q1,…,ql< m such that
1-Mismatch S= s0,s1,s2, … sm-1 S2,0= s0,s2,s4… mod 2 S2,1= s1s3,s5… Examples:q1=2, q2=3 S3,0= s0,s3,s6… S3,1= mod 3 s1,s4,s7… For each qi construct qisubsequences of S as follows: subsequence Sqi,j is all elements of S whose index is j mod qi. S3,2= s2,s5,s8…
Example s0,s1,s2, s3,s4,s5 S= s0,s2,s4 S2,0= mod 2 s1s3,s5 S2,1= s0,s3 S3,0= s1,s4 S3,1= mod 3 s2,s5 S3,2=
1-Mismatch • We need to compare: • We prove that in the partitions strings: s0 , s1, s2, … sm-2 ,sm-1 sm-1, sm-2, sm-3… s1 , s0 Sq,j= SRq,(m-1-j)mod q
Example s0,s1,s2,s3,s4,s5 S= s5,s4,s3,s2,s1,s0 SR= s0,s2,s4 S2,0= s0,s3 S3,0= s1s3,s5 S2,1= s0,s2,s4 S2,0= s5,s2 SR3,2= s0,s3 s5s3,s1 S3,0= SR2,1= s1,s4 S3,1= s1,s4 S3,1= s4,s1 SR3,1= s2,s5 S3,2=
Exact Matching Lemma: S=SR Sq,j = SRq,(m-1-j) modq for all q and all 0 ≤ j ≤ q.
1-Mismatch Lemma: There is exactly one mismatch There is exactly one subpattern in each group that does not match. C.R.T
Chinese Remainder Theorem Let n and m two positive integers. In our case: if two different indices, i and j, have an error, and only one subsequence is erroneous, since the product of all q’s > m, it means that i=j.
Complexity There exists a constant c such that, for any x<m, there are at least x/log m prime numbers between x and cx. Therefore, choose prime numbers between log m and c log m.
Complexity For each qi we compute 2qi different fingerprints: Overall space: Each character participates in exactly two fingerprints (the regular and the reverse). Overall time:
Online All fingerprint calculations can be done online We know the m at every input character, to compute the comparisons. Conclude: Our algorithm is online.
k-Mismatches Use Group testing…
k-Mismatches Group Testing • Given nitems with some positive ones, identify all positive ones by a small number of tests. • Each test is on a subset of items. • Test outcome is positive iff there is a positive item in the subset.
k-Mismatch • Group: partition of the text. • Test: distinguish between: (using the 1-mismatch algorithm) • match • 1-mismatch • more then 1-mismatch
k-Mismatches S= s0,s1,s2, … sm-1 Each Sq,j is a group in our group testing S2,0= s0,s2,s4… mod 2 S2,1= s1s3,s5… S3,0= s0,s3,s6… Similar to the 1-mismatch algorithm just with more prime numbers… S3,1= mod 3 s1,s4,s7… S3,2= s2,s5,s8…
Our tests • We define The reversal pair of Sq,j to be SRq,(m-1-j)mod q • Each partition is “tested against” its reversal pair.
Correctness s0,s1,s2, … sj …. sm-1 i2 i9 i5 i7 i For any group of k character i1,i2,..ik There exists a partition where sj appears alone C.R.T
Correctness s0,s1,s2, … sj …. sm-1 i2 i9 i5 i7 i If sj invokes a mismatch we will catch it.
Complexity • Overall space: • Overall time:
Approximate Reversal Distance Using the palindrome up to k-mismatches algorithm, can be solved in time, and space.