140 likes | 283 Views
Synchronization of Huffman codes. Marek Biskup Warsaw University Phd-Open, 2007-05-26. 0. 1. 0. 1. 0. 1. a. 0. 1. b. c. d. e. Huffman Codes. Each letter has a corresponding binary string (its code) The codes form a complete binary tree
E N D
Synchronization of Huffman codes Marek Biskup Warsaw University Phd-Open, 2007-05-26
0 1 0 1 0 1 a 0 1 b c d e Huffman Codes • Each letter has a corresponding binary string (its code) • The codes form a complete binary tree • The depth of a letter depends its probability in the source • The code is decodable N=5 h=3 Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 a 0 1 b c d e Coding and decoding • Source sequence • bbeabce • Encoded text • 01 01 111 00 00 111 01 10 111 10 • b b e a a c b c e c • Encoding: • For each input letter print out its code • Decoding • Use the Huffman tree as a finite automaton • Start in the root; when you reach a leaf, print out its letter and start again Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 a 0 1 b c d e Parallel decoding • Use two processors to decode a string • CPU1 starts from the beginning • CPU2 starts in the middle • 01 01 111 00 00 111 01 10 111 10 • Where is the middle? • 01011110000111011011110 (bbeaacbcec) • 010111100001 11011011110 ? • CPU2: 110 110 111 10 • d d e c Wrong! CPU2 CPU1 Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 a 0 1 b c d e Parallel decoding • Correct: • 0 10 11 1 10 00 01 1 10 11 01 1 11 0 • b b e a a c b c e c • Incorrect: • 0 10 11 1 10 00 011 1 01 1 01 1 11 0 • b b e a a d d e c Synchronization! Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 a 0 1 b c d e Bit corruption • Correct: • 0 10 11 1 10 00 01 1 10 11 01 1 11 0 • b b e a a c b c e c • Bit error: • 1 1 01 1 11 00 00 11 1 01 1 01 1 11 0 • d e c a b d d e c Synchronization! Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 a 0 1 b c d e Huffman code automaton • Huffman Tree = finite automaton • -transitions from leaves to the root • Synchronization: • the automaton is in the root when on a codeword boundary • 1 1 1$0 1$1 0$1 1 1$1 0 • c b c e c • Lack of synchronization: • the automaton is in the root when inside a codeword • The automaton is in an inner node when on a codeword boundary • 1 1 1 $ 01 $ 1 0$1 1 1$1 0 • d d e c Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 a 0 1 b c d e Synchronization • A Huffman Code is self-synchronizing if for any inner node there is a sequence of codewords such that the automaton reaches the root • Every self-synchronizing Huffman code will eventually resynchronize (for an -guaranteed source) • Almost all Huffman Codes are self-synchronizing • Definition: A synchronizing string is a sequence of bits that moves any node to the root. • Theorem: A Huffman code is self-synchronizing iff it hasa synchronizing string Synchronizing string: 0110 Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 b c a 0 1 e d Synchronizing codewords • Can a synchronizing string be a codeword? • Yes! 010 011 Marek Biskup - Synchronization of Huffman Codes
0 1 0 1 0 1 0 1 b c a 0 1 0 1 0 1 e d a 0 1 b c d e Optimal codes • Minumum redundancy codes are not unique: 2 synchronizing codewords No synchronizing codeword Marek Biskup - Synchronization of Huffman Codes
Code characteristics • Open problems: chose the best Huffman code with respect to: • Average number of bits to synchronization • The length of the synchronizing string • Existence and length of synchronizing codewords • Open problem? • The limit on the number of bits in a synchronizing string • O(N3) – known result for all automata • O(hN logN) – my result for Huffman automata • O(N2) – Cerny conjecture for all automata Marek Biskup - Synchronization of Huffman Codes
Detecting synchronization • Can a decoder find out that it has synchronized? • Yes! • For example if it receives a synchronizing string • A more general algorithm: • Try to start decoding the text from h consecutive positions (h „decoders”) • Synchronization takes place if all decoders reach the same word boundary • This can be done without increasing the complexity of decoding (no h dependence) Marek Biskup - Synchronization of Huffman Codes
Guaranteed synchronization • Self-synchronizing Huffman Codes: no upper bound on the number of bits before synchronization • My work (together with prof. Wojciech Plandowski): Extension to the Huffman coding • No redundancy if the code would synchronize • Small redundancy if it wouldn’t: O(1/N) per bit • N – number of bits before guaranteed synchronization • Linear time in the number of coded bits • Coder: • Analyze each possible starting position of a decoder • Add a synchronization string whenever there is a decoder with the number of lost bits above the threshold • Decoder: • Just decode • Skip synchronization strings inserted by the coder Marek Biskup - Synchronization of Huffman Codes
Summary • Huffman codes can be decompressed in parallel • After some bits (on average) a decoder which starts in the middle will synchronize • No upper bound on the number of incorrectly decoded symbols • With a small additional redundancy one may impose such a bound Marek Biskup - Synchronization of Huffman Codes