1 / 10

Huffman Codes

Huffman Codes. Information coding: Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. Why? These electric signals are either present or absent at any specific time. Suppose Voyager on-board camera is sensitive to four shades of gray: White Light gray

zelig
Download Presentation

Huffman Codes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Huffman Codes • Information coding: • Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. • Why? These electric signals are either present or absent at any specific time. • Suppose Voyager on-board camera is sensitive to four shades of gray: • White • Light gray • Dark gray • black • Camera picture is digitized into 24000 (400*600) “dots”, then transmitted by radio to Earth, in a single stream of signals, to be reconstructed and printed.

  2. Huffman Codes • In designing a binary code, we want to decide how to encode the “color” of each dot in binary, so that: • 1) No waste of signals (efficiency) • 2) Recognizable (later) • Example: encode • White – 0001 • Light gray – 0010 • Dark gray – 0100 • Black – 1000 WASTEFUL!! One picture would cost 4*24000 = almost 100 000 signals 4 “digits” per symbol (dot) • How many digits do you need? • 1 not enough, only 2 values • 2 ok 4 values • 3 too much • …

  3. 1 0 0 1 0 1 B DG LG W Huffman Codes Fixed-length code of length 2 (2 yes/no questions suffice to identify the color) No problem on receiving end, every two digits define a dot. • Try 2: • W – 00 • LG – 01 • DG – 10 • B – 11 Encoding mechanism: Decision tree Start at root, follow till leaf is reached

  4. 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 DG B 0 W LG 0 1 Huffman Codes • There are other shapes with four leaf nodes Which one is better? Criterion is weighted average length Suppose we have these probabilities: W -- .40 -- 1 LG -- .30 -- 00 DG -- .18 -- 011 B -- .12 -- 010

  5. Huffman Codes • VARIABLE – LENGTH CODE • Weighted average for tree 1 = .40*2 + .30*2 + .18*2 + .12*2 = 2 • Weighted average for tree 2 = .40*1 + .30*2 + .18*3 + .12*3 = 1.9 • On average, tree 2 is better, costs only 1.9*24000 = 45600, less than half of first try.

  6. Huffman Codes • General problem: • Given n symbols, with their respective probabilities, which is the best tree? (code?) • To determine the fewest digits (yes/no questions necessary to identify the symbol) • Construct the tree from the leaves to root: • 1) label each leaf with its probabilities • 2) Determine the two fatherless nodes with the smallestprobabilities. In case of tie, choose arbitrarily. • 3) Create a father for these two nodes; label father with the sum of the two probabilities. • 4) Repeat 2) 3) until there is 1 fatherless node (the root).

  7. 1.0 0 1 .60 0 1 .30 0 1 .12 .18 .30 .40 B DG LG W So, we have: W -- .40 -- 1 LG -- .30 -- 01 DG -- .18 -- 001 B -- .12 -- 000 • In our case: By convention, left is 0, right is 1 Using this method, the code obtained is minimum – redundancy, or Huffman code.

  8. 0 1 0 0.26 0.46 0 1 1 0 1 0.11 0.15 0.21 0.25 0.28 e d c b a Sample Huffman code; minimize the average number of yes/no questions necessary to distinguish 1 of 5 symbols that occur with known probabilities. 1.00 a – 01 b – 11 c – 10 d – 001 e – 000 0.54

  9. The Huffman code is always a prefix code. A prefix code satisfies the prefix condition. A code satisfies the prefix condition if no code is a prefix of another code. • Weighted Average Length = 2*(.28+.25+.21)+3*(.15+.11) = 2*.74 + 3*.26 = 2.26

  10. Not A Prefix code: A Prefix code: 1 01 001 000 0 1 a:0 b:1 c:00 d:01 0 1 1 0 0 1 0 1 At any point, it’s possible to delimit the symbol If met with 00, it is ambiguous, can’t figure out if it is aa or c Not A Prefix code: a:0 b:01 c:10 0 1 1 0 Not ambiguous Example.

More Related