E N D
1. 1 Computing Inverse ST in Linear Complexity Ge Nong, Sun Yat-Sen University, PRC
Sen Zhang, SUNY College at Oneonta, USA
Wai Hong Chan, Hong Kong Baptist University, HK
2. 2 Outline Problem introduction
Existing solutions
The linear solution
3. 3 Introduction Burrows and Wheeler Transfrom (BWT)
4. 4 Introduction Sort Transform (ST) - Introduced in 1996 by Schindler
A generalization of the BWT for limited-order contexts
5. 5 IBWT vs. IST
6. 6 Difficulties in Inverting ST Problems to be solved
All the k-order contexts must be restored in their correct order
To detect if any adjacent pair of k-order contexts are equal or not
7. 7 Existing Results on BWT and ST Results on BWT
Suffix array construction algorithms can be directly used to compute BWT, linear algorithms in O(n) time/space are available
Computing the inverse BWT is trivial, and can be done in O(n) time/space
Results on ST
ST can be computed using the algorithm for BWT, with slight modifications
The best algorithm for computing the inverse ST requires O(n) space and O(nlogk) time
8. 8 Existing Solutions Question: Is the inverse ST linear computable?
9. 9 Objective To design an algorithm for computing inverse ST in linear time/space, i.e., O(n)
10. 10 The Linear Solution Problem formalization
S: a size-n string, ending with a sentinel $
Fk/Lk: the transposed first/last columns of Mk, respectively
Pk: mapping Lk to Fk
Qk: mapping Fk to Lk, reciprocal to Pk
11. 11 Restoring all the k-order contexts We introduce the concept of “cycle” to solve this problem
The characters of Lk are classified into cycles, where each cycle a(i) is defined as
12. 12 Algorithm for finding all the cycles Algorithm framework
Initially, mark all the items of Lk as unvisited.
Traverse Lk once from left to right. For each unvisited item Lk[i], retrieve the cycle a(i) using Qk in O(ni) time, where ni=|a(i)| is the length of cycle a(i)
The algorithm can find all cycles in O(n) time/space
13. 13 Examples of Cycles
14. 14 Computing heights for cycles Height(i) = LCP (i -1,i)
Height(i) < k if and only if D(i)=1
How to compute the heights of the characters in all cycles in O(n) time/space?
15. 15 Computing the Heights in O(n) time Classify the cycles into two kinds, according to their lengths
For a cycle with ni = k/2, we need O(ni +k)=O(ni) to compute the heights for all the characters in the cycle, see Lemma 2.
For a cycle with ni < k/2, we need O(ni) to compute the heights for all the characters in the cycle, see Theorem 4.
16. 16 The main results We have explored a number of properties to see that
For a cycle with length ni = k/2, the time for finding the heights of all the characters in this cycle is, in the worst case, k + 2ni - 2 ? O(ni)
17. 17
18. 18 Theorem 5
Given Lk, we can restore the original text S in O(n) time/space, for k in [1, n].
19. 19 Making the overall solution Consisting of these linear steps:
Find all the cycles, and classify them into two classes according to whether their length is shorter than k/2 or not
Compute the heights for all the characters in the cycles not shorter than k/2, and their lower adjacent cycles shorter than k/2
Compute the heights for all the characters in the cycles shorter than k/2 and not included in the above step
From the height vector H, compute the difference vector D
Restore S from Lk, Pk and D
20. 20 Complexity
21. 21 Undergoing works Symmetric encryption solutions based on ST/IST
Efficient implementation of the proposed linear algorithm
Efficient algorithms for direct computing the ST
22. 22 Acknowledgment Thanks to all the reviewers of this paper
Thanks to CPM 2008