100 likes | 261 Views
Huffman Codes. Information coding: Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. Why? These electric signals are either present or absent at any specific time. Suppose Voyager on-board camera is sensitive to four shades of gray: White Light gray
E N D
Huffman Codes • Information coding: • Most info transmission machines (computer terminal, Voyager spacecraft) use a binary code. • Why? These electric signals are either present or absent at any specific time. • Suppose Voyager on-board camera is sensitive to four shades of gray: • White • Light gray • Dark gray • black • Camera picture is digitized into 24000 (400*600) “dots”, then transmitted by radio to Earth, in a single stream of signals, to be reconstructed and printed.
Huffman Codes • In designing a binary code, we want to decide how to encode the “color” of each dot in binary, so that: • 1) No waste of signals (efficiency) • 2) Recognizable (later) • Example: encode • White – 0001 • Light gray – 0010 • Dark gray – 0100 • Black – 1000 WASTEFUL!! One picture would cost 4*24000 = almost 100 000 signals 4 “digits” per symbol (dot) • How many digits do you need? • 1 not enough, only 2 values • 2 ok 4 values • 3 too much • …
1 0 0 1 0 1 B DG LG W Huffman Codes Fixed-length code of length 2 (2 yes/no questions suffice to identify the color) No problem on receiving end, every two digits define a dot. • Try 2: • W – 00 • LG – 01 • DG – 10 • B – 11 Encoding mechanism: Decision tree Start at root, follow till leaf is reached
0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 DG B 0 W LG 0 1 Huffman Codes • There are other shapes with four leaf nodes Which one is better? Criterion is weighted average length Suppose we have these probabilities: W -- .40 -- 1 LG -- .30 -- 00 DG -- .18 -- 011 B -- .12 -- 010
Huffman Codes • VARIABLE – LENGTH CODE • Weighted average for tree 1 = .40*2 + .30*2 + .18*2 + .12*2 = 2 • Weighted average for tree 2 = .40*1 + .30*2 + .18*3 + .12*3 = 1.9 • On average, tree 2 is better, costs only 1.9*24000 = 45600, less than half of first try.
Huffman Codes • General problem: • Given n symbols, with their respective probabilities, which is the best tree? (code?) • To determine the fewest digits (yes/no questions necessary to identify the symbol) • Construct the tree from the leaves to root: • 1) label each leaf with its probabilities • 2) Determine the two fatherless nodes with the smallestprobabilities. In case of tie, choose arbitrarily. • 3) Create a father for these two nodes; label father with the sum of the two probabilities. • 4) Repeat 2) 3) until there is 1 fatherless node (the root).
1.0 0 1 .60 0 1 .30 0 1 .12 .18 .30 .40 B DG LG W So, we have: W -- .40 -- 1 LG -- .30 -- 01 DG -- .18 -- 001 B -- .12 -- 000 • In our case: By convention, left is 0, right is 1 Using this method, the code obtained is minimum – redundancy, or Huffman code.
0 1 0 0.26 0.46 0 1 1 0 1 0.11 0.15 0.21 0.25 0.28 e d c b a Sample Huffman code; minimize the average number of yes/no questions necessary to distinguish 1 of 5 symbols that occur with known probabilities. 1.00 a – 01 b – 11 c – 10 d – 001 e – 000 0.54
The Huffman code is always a prefix code. A prefix code satisfies the prefix condition. A code satisfies the prefix condition if no code is a prefix of another code. • Weighted Average Length = 2*(.28+.25+.21)+3*(.15+.11) = 2*.74 + 3*.26 = 2.26
Not A Prefix code: A Prefix code: 1 01 001 000 0 1 a:0 b:1 c:00 d:01 0 1 1 0 0 1 0 1 At any point, it’s possible to delimit the symbol If met with 00, it is ambiguous, can’t figure out if it is aa or c Not A Prefix code: a:0 b:01 c:10 0 1 1 0 Not ambiguous Example.