630 likes | 772 Views
Computing Reversed Lempel-Ziv Factorization Online. Shiho Sugimoto , Tomohiro I, Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda Kyushu University, Japan. Outline. Reversed LZ factorization without self-references (RLZ) Online RLZ algorithm by Kolpakov and Kucherov
E N D
Computing ReversedLempel-Ziv Factorization Online Shiho Sugimoto, Tomohiro I, ShunsukeInenaga,Hideo Bannai, Masayuki Takeda Kyushu University, Japan
Outline • Reversed LZ factorization without self-references (RLZ) • Online RLZ algorithm by Kolpakov and Kucherov • New online RLZ algorithm using O(n log σ) bits of space • Reversed LZ factorization with self-references (RLZS) • New online RLZS algorithm using O(n log n)bits of space • New online RLZS algorithm using O(n log σ)bits of space n: the length of input string σ : the alphabet size
Background • LZ factorization was proposed in 1977[Ziv & Lempel, 1977]. • data compression etc. • Reversed LZ factorization (RLZ in short) was proposed in 2009 [Kolpakov & Kucherov, 2009]. • finding gapped palindromes etc.
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 Ex)w = a b b a a a a b b b a c
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 Ex)w = a b b a a a a b b b a c
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 Ex)w = a b b a a a a b b b a c
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 Ex)w = a b b a a a a b b b a c
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 s6 Ex)w = a b b a a a a b b b a c
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 s6 s7 Ex)w = a b b a a a a b b b a c
LZ factorization without self-references [Ziv & Lempel, 1977] LZ factorization without self-references of string w of length n is a factorizations1,s2,...,smsuch that • w = s1s2…sm • siis the longest non-empty prefix ofw[|s1…si−1|+1..n]that is also a substring ofw[1.. | s1…si−1|]if such exists • si = w[|s1…si−1|+1] otherwise s1 s2 s3 s4 s5 s6 s7 s8 s9 Ex)w = a b b a a a a b b b a c
Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise reversed
Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 Ex)w = a b b a a a a b b b a c reversed
Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 Ex)w = a b b a a a a b b b a c reversed
Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 f4 Ex)w = a b b a a a a b b b a c reversed
Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 f4 f5 Ex)w = a b b a a a a b b b a c reversed
Reversed LZ factorizationwithout self-references (RLZ) [Kolpakov & Kucherov, 2009] RLZ without self-references of string wof lengthnis a factorizationf1,f2,...,fmsuch that • w = f1 f2…fm • fiis the longest non-empty prefix ofw[|f1...fi−1|+1..n]that is also a substring ofw[1.. | f1...fi−1|]Rif such exists • fi = w[|f1...fi−1|+1] otherwise f1 f2 f3 f4 f5 f6 f7 Ex)w = a b b a a a a b b b a c reversed
KK algorithm [Kolpakov & Kucherov, 2009] • Computes RLZ in an online manner • Works inO(n log n) bits of space andO(n log σ) time (on a word RAM model). • Constructs suffix tree for reversed prefixes online. • Computes RLZ factors from suffix tree. • Blumer’s version of Weiner’s algorithm achieves above complexity [Blumer et al, 1985] [Weiner, 1973].
KK algorithm [Kolpakov & Kucherov, 2009] f1 Ex)w = a b b a a a a b b b a c Stree(ε)
KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 Ex)w = a b b a a a a b b b a c Stree(aR) a
KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 Ex)w = a b b a a a a b b b a c Stree((ab)R) b a a
KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 Ex)w = a b b a a a a b b b a c Stree((ab)R) b a a
KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 f4 Ex)w = a b b a a a a b b b a c Stree((abba)R) a b b b a b a a
KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 f4 f5 Ex)w = a b b a a a a b b b a c Stree((aabba)R) a b a b b b b a a a b a
KK algorithm [Kolpakov & Kucherov, 2009] f1 f2 f3 f4 f5 Ex)w = a b b a a a a b b b a c Stree((aabba)R) This suffix tree requires O(n log n) bits of space a b a b b b b a a a b a We propose a new online RLZ algorithm which uses only O(n log σ) bits of space. (σ≦n is the alphabet size)
ForO(n log σ) bits of space • We utilize the idea of Starikovskaya’s algorithm. • It computes LZ factorization online in O(n log σ) bits of space and O(n log2n)time [Starikovskaya, 2012]. • We divide input string into blocks of lengthr= O(logσn). • Each block is replaced by a meta-character.
ForO(n log σ) bits of space • We utilize the idea of Starikovskaya’s algorithm. • It computes LZ factorization online in O(n log σ) bits of space and O(n log2n)time [Starikovskaya, 2012]. • We divide input string into blocks of lengthr= O(logσn). • Each block is replaced by a meta-character. Ex)w = a b b a a a a b b b a c ……… r = 3 • B A B C ………
ForO(n log σ) bits of space • We utilize the idea of Starikovskaya’s algorithm. • It computes LZ factorization online in O(n log σ) bits of space and O(n log2n)time [Starikovskaya, 2012]. • We divide input string into blocks of lengthr= O(logσn). • Each block is replaced by a meta-character. Ex)w = a b b a a a a b b b a c ……… r = 3 • B A B C ………
Our online RLZ algorithm • For fiof length shorter than r, we use suffix trie of reversed subwords of length 2r. • can find fi in o(n) bits of space and O(|fi| log σ) time. • For fi of length at least r, we use suffix tree of reversed blocks (meta-characters). • can find fi in O(n log σ)bits of space and O(|fi| log2n) time.
Our online RLZ algorithm • For fiof length shorter than r, we use suffix trie of reversed subwords of length 2r. • can find fi in o(n) bits of space and O(|fi| log σ) time. • For fi of length at least r, we use suffix tree of reversed blocks (meta-characters). • can find fi in O(n log σ)bits of space and O(|fi| log2n) time. Theorem We can compute RLZ without self-references online in O(n log σ)bits of space and O(nlog2n) time.
Outline • Reversed LZ factorization without self-references (RLZ) • Online RLZ algorithm by Kolpakov and Kucherov • New online RLZ algorithm using O(n log σ) bits of space • Reversed LZ factorization with self-references (RLZS) • New online RLZS algorithm using O(n log n)bits of space • New online RLZS algorithm using O(n log σ)bits of space n: the length of input string σ : the alphabet size
LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference
LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference t1 t2 t3 Ex)w = a b b a a a a b b b a c
LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference t1 t2 t3 t4 Ex)w = a b b a a a a b b b a c
LZ factorization withself-references [Ziv & Lempel, 1977] LZ factorization with self-references of string w of length nis a factorizationt1,t2,...,tmsuch that • w = t1t2…tm • tiis the longest non-empty prefix of w[|t1…ti−1|+1..n] that is also a substring ofw[1.. |t1…ti|-1]if such exists • ti= w[|t1…ti−1|+1]otherwise. self-reference t1 t2 t3 t8 t4 t5 t6 t7 Ex)w = a b b a a a a b b b a c
Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference
Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference g1 g2 Ex)w = a b b a a a a b b b a c
Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference g1 g3 g2 Ex)w = a b b a a a a b b b a c
Reversed LZ factorizationwithself-references RLZ with self-references (RLZS) of string w of length nis a factorizationg1,g2,...,gmsuch that • w = g1g2…gm • giis the longest non-empty prefix of w[|g1...gi−1|+1..n] that is also a substring ofw[1.. |g1…gi|-1]Rif such exists • gi= w[|g1…gi−1|+1]otherwise. self-reference g1 g3 g2 g4 g5 Ex)w = a b b a a a a b b b a c
online computation of RLZS Ex)w= a b b a a a a b b b a c w[1..1] = a
online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b
online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b
online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b w[1..4]= a b b a
online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b w[1..4]= a b b a
online computation of RLZS Ex)w = a b b a a a a b b b a c w[1..1] = a w[1..2]= a b w[1..3]= a b b w[1..4]= a b b a w[1..5]= a b b a a w[1..6]= a b b a a a w[1..7]= a b b a a a a w[1..8]= a b b a a a a b w[1..9]= a b b a a a a b b w[1..10]= a b b a a a a b b b w[1..11]= a b b a a a a b b b a w[1..12]= a b b a a a a b b b a c
Reversed LZ factorizationwithself-references Every self-referencing factor is a suffix of a palindrome. g1 g3 g2 g4 g5 Ex)w = a b b a a a a b b b a c palindrome
Reversed LZ factorizationwithself-references Every self-referencing factor is a suffix of a palindrome. g1 g3 g2 g4 g5 Ex)w = a b b a a a a b b b a c palindrome
online RLZS in O(nlogn) bits of space We can compute each RLZS factor giby • using KK algorithm, and • In a total of O(n log n)bits of space andO(n log σ)time. • computing the longest palindrome which ends at each position, online • In a total of O(n log n) bits of space and O(n)time, by modifying Manachar’s algorithm [Manacher, 1975]. Theorem We can compute RLZS online in O(n log n) bits of space andO(n logσ) time.
Outline • Reversed LZ factorization without self-references (RLZ) • Online RLZ algorithm by Kolpakov and Kucherov • New online RLZ algorithm using O(n log σ) bits of space • Reversed LZ factorization with self-references (RLZS) • New online RLZS algorithm using O(n log n)bits of space • New online RLZS algorithm using O(n log σ)bits of space n: the length of input string σ : the alphabet size
Suffix palindromes • All suffix palindromes of a string of length n can be presented by O(log n) arithmetic progressions [Apostolico,1995].