1 / 14

Synchronization of Huffman codes

Synchronization of Huffman codes. Marek Biskup Warsaw University Phd-Open, 2007-05-26. 0. 1. 0. 1. 0. 1. a. 0. 1. b. c. d. e. Huffman Codes. Each letter has a corresponding binary string (its code) The codes form a complete binary tree

Download Presentation

Synchronization of Huffman codes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synchronization of Huffman codes Marek Biskup Warsaw University Phd-Open, 2007-05-26

  2. 0 1 0 1 0 1 a 0 1 b c d e Huffman Codes • Each letter has a corresponding binary string (its code) • The codes form a complete binary tree • The depth of a letter depends its probability in the source • The code is decodable N=5 h=3 Marek Biskup - Synchronization of Huffman Codes

  3. 0 1 0 1 0 1 a 0 1 b c d e Coding and decoding • Source sequence • bbeabce • Encoded text • 01 01 111 00 00 111 01 10 111 10 • b b e a a c b c e c • Encoding: • For each input letter print out its code • Decoding • Use the Huffman tree as a finite automaton • Start in the root; when you reach a leaf, print out its letter and start again Marek Biskup - Synchronization of Huffman Codes

  4. 0 1 0 1 0 1 a 0 1 b c d e Parallel decoding • Use two processors to decode a string • CPU1 starts from the beginning • CPU2 starts in the middle • 01 01 111 00 00 111 01 10 111 10 • Where is the middle? • 01011110000111011011110 (bbeaacbcec) • 010111100001 11011011110 ? • CPU2: 110 110 111 10 • d d e c Wrong! CPU2 CPU1 Marek Biskup - Synchronization of Huffman Codes

  5. 0 1 0 1 0 1 a 0 1 b c d e Parallel decoding • Correct: • 0 10 11 1 10 00 01 1 10 11 01 1 11 0 • b b e a a c b c e c • Incorrect: • 0 10 11 1 10 00 011 1 01 1 01 1 11 0 • b b e a a d d e c Synchronization! Marek Biskup - Synchronization of Huffman Codes

  6. 0 1 0 1 0 1 a 0 1 b c d e Bit corruption • Correct: • 0 10 11 1 10 00 01 1 10 11 01 1 11 0 • b b e a a c b c e c • Bit error: • 1 1 01 1 11 00 00 11 1 01 1 01 1 11 0 • d e c a b d d e c Synchronization! Marek Biskup - Synchronization of Huffman Codes

  7. 0 1 0 1 0 1 a 0 1 b c d e Huffman code automaton • Huffman Tree = finite automaton • -transitions from leaves to the root • Synchronization: • the automaton is in the root when on a codeword boundary • 1 1 1$0 1$1 0$1 1 1$1 0 • c b c e c • Lack of synchronization: • the automaton is in the root when inside a codeword • The automaton is in an inner node when on a codeword boundary • 1 1 1 $ 01 $ 1 0$1 1 1$1 0 • d d e c Marek Biskup - Synchronization of Huffman Codes

  8. 0 1 0 1 0 1 a 0 1 b c d e Synchronization • A Huffman Code is self-synchronizing if for any inner node there is a sequence of codewords such that the automaton reaches the root • Every self-synchronizing Huffman code will eventually resynchronize (for an -guaranteed source) • Almost all Huffman Codes are self-synchronizing • Definition: A synchronizing string is a sequence of bits that moves any node to the root. • Theorem: A Huffman code is self-synchronizing iff it hasa synchronizing string Synchronizing string: 0110 Marek Biskup - Synchronization of Huffman Codes

  9. 0 1 0 1 0 1 b c a 0 1 e d Synchronizing codewords • Can a synchronizing string be a codeword? • Yes! 010 011 Marek Biskup - Synchronization of Huffman Codes

  10. 0 1 0 1 0 1 0 1 b c a 0 1 0 1 0 1 e d a 0 1 b c d e Optimal codes • Minumum redundancy codes are not unique: 2 synchronizing codewords No synchronizing codeword Marek Biskup - Synchronization of Huffman Codes

  11. Code characteristics • Open problems: chose the best Huffman code with respect to: • Average number of bits to synchronization • The length of the synchronizing string • Existence and length of synchronizing codewords • Open problem? • The limit on the number of bits in a synchronizing string • O(N3) – known result for all automata • O(hN logN) – my result for Huffman automata • O(N2) – Cerny conjecture for all automata Marek Biskup - Synchronization of Huffman Codes

  12. Detecting synchronization • Can a decoder find out that it has synchronized? • Yes! • For example if it receives a synchronizing string • A more general algorithm: • Try to start decoding the text from h consecutive positions (h „decoders”) • Synchronization takes place if all decoders reach the same word boundary • This can be done without increasing the complexity of decoding (no h dependence) Marek Biskup - Synchronization of Huffman Codes

  13. Guaranteed synchronization • Self-synchronizing Huffman Codes: no upper bound on the number of bits before synchronization • My work (together with prof. Wojciech Plandowski): Extension to the Huffman coding • No redundancy if the code would synchronize • Small redundancy if it wouldn’t: O(1/N) per bit • N – number of bits before guaranteed synchronization • Linear time in the number of coded bits • Coder: • Analyze each possible starting position of a decoder • Add a synchronization string whenever there is a decoder with the number of lost bits above the threshold • Decoder: • Just decode • Skip synchronization strings inserted by the coder Marek Biskup - Synchronization of Huffman Codes

  14. Summary • Huffman codes can be decompressed in parallel • After some bits (on average) a decoder which starts in the middle will synchronize • No upper bound on the number of incorrectly decoded symbols • With a small additional redundancy one may impose such a bound Marek Biskup - Synchronization of Huffman Codes

More Related