270 likes | 416 Views
An Optimal Algorithm for Online Square Detection. Gen-Huey Chen, Jin-Ju Hong, Hsueh-I Lu National Taiwan University. Outline. The definitions of the square detection problem and the online square detection problem The techniques of the algorithm in [Cro86] for the square detection problem
E N D
An Optimal Algorithm for Online Square Detection Gen-Huey Chen, Jin-Ju Hong, Hsueh-I Lu National Taiwan University CPM 2005
Outline • The definitions of the square detection problem and the online square detection problem • The techniques of the algorithm in [Cro86] for the square detection problem • Our algorithm for the online square detection problem • Conclusion CPM 2005
Square Detection Problem • Square: a nonempty string of the form XX • E.g. “a b c a b c” is a square. “a b c a b c a” is not a square. • Input: a string S • Square detection problem: Is there a square in S? CPM 2005
Online Square Detection Problem • Leung, Peng, and Ting in COCOON’04 • Input: a string S • Let m be the unknown smallest integer s.t. S[1..m] contains a square. • Online square detection problem: Determine m as soon as S[m] is read. • An O(m log2m)-time algorithm [LPT04] • An O(m logβ)-time algorithm in our paper CPM 2005
Algorithm in [Cro86] forSquare Detection Problem fork = 1 top// p: # of blocks { if a square ends in Bithenreturn YES; } return NO; B1 B2 B3 B4 . . . Bp CPM 2005
f-factorization • Let dk denote the starting position of the k-th block Bk. • Bk is S[dk] if S[dk] does not occur before dk, or the longest prefix of S[dk..n] that occurs before dk. 1 2 3 4 5 6 7 8 9 10 11… • E.g. S = a a a b b a b a b a a … B1B2B3B4B5B6 CPM 2005
f-factorization (cont.) • A square ending in Bk is centered either in Bk-1 or in Bk. . . . Bk-1 Bk CPM 2005
Square Ending in the k-th Block • Case 1. The square is entirely in the k-th block. • Case 2. The square begins in the (k-1)-st block. • Case 2.1. The square is centered in the (k-1)-st block. • Case 2.2. The square is centered in the k-th block. • Case 3. The square begins before the (k-1)-st block and centered in the (k-1)-st or k-th block. … … … … CPM 2005
Our Algorithm for OnlineSquare Detection Problem fori = 1 ton// n = |S| { compute the f-factorization of S[1..i]; if a square ends at S[i] thenreturni; } return NO-SQUARE; CPM 2005
Square Ending at S[i] in Bk • Case 1. The square is entirely in the k-th block. • Case 2. The square begins in the (k-1)-st block. • Case 2.1. The square is centered in the (k-1)-st block. • Case 2.2. The square is centered in the k-th block. • Case 3. The square begins before the (k-1)-st block and centered in the (k-1)-st or k-th block. … … … … CPM 2005
S • L(i1, i2, i)-square: • R(i1, i2, i)-square: i1 j c i2 i i1 j < i2 i1 c < i2 S i1 j i2 c i i1 j < i2 i2 c < i CPM 2005
Square Ending at S[i] in Bk • Case 1. The square is entirely in the k-th block. • Case 2. The square begins in the (k-1)-st block. • Case 2.1. The square is centered in the (k-1)-st block. • Case 2.2. The square is centered in the k-th block. • Case 3. The square begins before the (k-1)-st block and centered in the (k-1)-st or k-th block. … dk i dk-1 L(dk-1, dk, i)-square : … R(dk-1, dk, i)-square : … 1 dk-1 i R(1, dk-1, i)-square : … CPM 2005
Our Algorithm for OnlineSquare Detection Problem fori = 1 ton// n = |S| { compute the f-factorization of S[1..i]; let S[i] belong to Bk; if an L(dk-1, dk, i)-square is detected thenreturni; if an R(dk-1, dk, i)-square is detected thenreturni; if an R(1, dk-1, i)-square is detected thenreturni; } return NO-SQUARE; amortized O(logβ) time CPM 2005
Longest Common Extensions • For positions i1i2i3 in S • XR(i1, i2, i3): longest common right extension of positions i1 and i2 with boundary i3 1 2 3 4 5 6 7 8 9 10 • E.g. S = a b a b b a b a b a • XL(i2, i3, i1): longest common left extension of positions i2 and i3 with boundary i1 XR(3, 8, 10) = 2 XL(4, 9, 2) = 3 CPM 2005
Head Extension Function: XR(1, j, i) • If the string S is read character by character, in the i-th iteration, for all ji, XR(1, j, i) can be computed in O(1) time with totally O(i)-time preprocessing. 1 2 3 4 5 6 7 8 9 10 • E.g. S = a b a b b a b a b a XR(1,j,10) 10 0 2 0 0 4 0 3 0 1 • We call XR(1, j, i) the head extension function CPM 2005
L(i1, i2, i)-square S Y Z Y Z i1 j i2 i CPM 2005
L(i1, i2, i)-square • [ML84] S has an L(i1, i2, i)-square if and only if there is an index j with i1j<i2 such that XR(j, i2, i) = |S[i2..i]| and XL(j-1, i2-1, i1) + XR(j, i2, i) |S[j..i2-1]|. S Y Z Y Z i1 j i2 i S[1..i-1] contains no square. = CPM 2005
Detecting L(dk-1, dk, i)-squares • Let z(j) = |S[j..dk-1]|-XL(j-1,dk-1,dk-1) for all j in Bk-1 • In the i-th iteration: is there an index j in Bk-1 s.t. XR(j, dk, i) = z(j)? S Y Z Y =Z ? dk-1 j dk i z(j) CPM 2005
In the dk-th iteration (preprocessing) • Compute z(j) for all j in Bk-1 • Build the suffix tree of Bk-1$ • For all u, compute min{z(j)| j↔ a leaf in u’s subtree} S Y Z Y dk-1 j dk i z(j) u z(j) CPM 2005 O(|Bk-1|logβ) time
In the i-th iteration • If |S[dk..i]| equals the value stored in u a square ends at position i S Y Z Y =Z ? dk-1 j dk i z(j) S[dk..i] u z(j) CPM 2005
R(i1, i2, i)-square S Y Z Y Z i1 i2 j i CPM 2005
R(i1, i2, i)-square • [ML84] S has an R(i1, i2, i)-square if and only if there is an index j with i2<j<i such that XR(i2, j+1, i) = |S[j+1..i]| and XL(i2-1, j, i1) + XR(i2, j, i) |S[i2..j]|. S Y Z Y Z i1 i2 j i S[1..i-1] contains no square. = CPM 2005
Detecting R(dk-1, dk, i)-square • Let z(j) = |S[dk..j]|-XL(dk-1,j,dk-1) for all j in Bk • Insert the position j into the set of j+z(j) • For all j in the set of i, XR(dk, j+1, i) = z(j)? S Y Z Y =Z ? dk-1 dk j i set of j+z(j) amortized O(logβ) time z(j) insert j CPM 2005
Computing XL(dk-1, j, dk-1) g • |S[g,dk-1]| = min( |S[dk-1..dk-1]|, |S[dk..j]| ) • For all v with gv<dk, XL(v, dk-1, g) can be computed in O(1) time using the technique of computing the head extension function. S Y Z Y v dk-1 dk j i CPM 2005
Computing XL(dk-1, j, dk-1)(cont.) g • Let F(j) denote the longest suffix of S[dk..j] that is also a substring of S[g..dk-1] • XL(dk-1,j,dk-1) = |F(j)| if y=dk-1 min( |F(j)|, XL(y,dk-1,g) ) otherwise S Y Z Y F(j) dk-1 y dk j i CPM 2005
Time Complexity fori = 1 ton// n = |S| { compute the f-factorization of S[1..i]; let S[i] belong to Bk; if an L(dk-1, dk, i)-square is detected thenreturni; if an R(dk-1, dk, i)-square is detected thenreturni; if an R(1, dk-1, i)-square is detected thenreturni; } return NO-SQUARE; amortized O(logβ) time CPM 2005
Conclusion • Each of those O(logβ) terms comes from the traversal in a suffix tree of a string with O(β) distinct characters. • Expected time: O(m) • Is it possible to reduce the running time to worst-case O(m) time for a general alphabet? CPM 2005