820 likes | 1.23k Views
Huffman Coding. A simple example. Suppose we have a message consisting of 5 symbols, e.g. [ ►♣♣♠☻►♣☼►☻] How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) 5 symbols at least 3 bits For a simple encoding,
E N D
A simple example • Suppose we have a message consisting of 5 symbols, e.g. [►♣♣♠☻►♣☼►☻] • How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!) • 5 symbols at least 3 bits • For a simple encoding, length of code is 10*3=30 bits
A simple example – cont. • Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code • For Huffman code, length of encoded message will be ►♣♣♠☻►♣☼►☻ =3*2 +3*2+2*2+3+3=24bits
Another Example • A = 0B = 100C = 1010D = 1011R = 11 • ABRACADABRA = 01001101010010110100110 • This is eleven letters in 23 bits • A fixed-width encoding would require 3 bits for five different letters, or 33 bits for 11 letters • Notice that the encoded bit string can be decoded!
Huffman codes • Binary character code: each character is represented by a unique binary string. • A data file can be coded in two ways: The first way needs 1003=300 bits. The second way needs 45 1+13 3+12 3+16 3+9 4+5 4=232 bits.
Variable-length code • Need some carefulness to read the code. • 001011101 (codeword: a=0, b=00, c=01, d=11.) • Where to cut? 00 can be explained as either aa or b. • Prefix of 0011: 0, 00, 001, and 0011. • Prefix codes: no codeword is a prefix of some other codeword. (prefix free) • Prefix codes are simple to encode and decode.
Using codeword in Table to encode and decode • Encode: abc = 0.101.100 = 0101100 • (just concatenate the codewords.) • Decode: 001011101 = 0.0.101.1101 = aabe
100 0 0 1 100 1 a:45 14 86 0 1 0 0 1 0 1 1 58 14 0 28 0 1 0 1 0 1 c:12 b:13 d:16 14 30 0 1 25 55 a:45 b:13 c:12 d:16 e:9 f:5 e:9 f:5 • Encode: abc = 0.101.100 = 0101100 • (just concatenate the codewords.) • Decode: 001011101 = 0.0.101.1101 = aabe • (use the (right)binary tree below:) Tree for the fixed length codeword Tree for variable-length codeword
Binary tree • Every nonleaf node has two children. • Why? • The fixed-length code in our example is not optimal. • The total number of bits required to encode a file is • f ( c ): the frequency (number of occurrences) of c in the file • dT(c): denote the depth of c’s leaf in the tree
Constructing an optimal coding scheme • Formal definition of the problem: • Input:a set of characters C={c1, c2, …, cn}, each cC has frequency f[c]. • Output: a binary tree representing codewords so that the total number of bits required for the file is minimized. • Huffman proposed a greedy algorithm to solve the problem.
c:12 b:13 a:45 d:16 0 1 f:5 e:9 14 (a) f:5 e:9 c:12 b:13 d:16 a:45 (b)
a:45 0 1 c:12 b:13 d:16 0 1 a:45 f:5 e:9 0 1 1 0 c:12 b:13 d:16 0 1 f:5 e:9 14 14 30 25 25 (c) (d)
a:45 0 1 0 100 1 0 1 1 0 a:45 c:12 b:13 d:16 0 1 0 1 f:5 e:9 0 1 1 0 c:12 b:13 d:16 14 14 30 30 0 1 55 55 25 25 f:5 e:9 (f) (e)
HUFFMAN(C) 1 n:=|C| 2 Q:=C 3 for i:=1 to n-1 do 4 z:=ALLOCATE_NODE() 5 x:=left[z]:=EXTRACT_MIN(Q) 6 y:=right[z]:=EXTRACT_MIN(Q) 7 f[z]:=f[x]+f[y] 8 INSERT(Q,z) 9 return EXTRACT_MIN(Q)
The Huffman Algorithm • This algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. • C is a set of n characters, and each character c in C is a character with a defined frequency f[c]. • Q is a priority queue, keyed on f, used to identify the two least-frequent characters to merge together. • The result of the merger is a new object (internal node) whose frequency is the sum of the two objects.
Time complexity • Lines 4-8 are executed n-1 times. • Each heap operation in Lines 4-8 takes O(lg n) time. • Total time required is O(n lg n). Note: The details of heap operation will not be tested. Time complexity O(n lg n) should be remembered.
An Complete ExampleScan the original text Eerie eyes seen near lake. • What characters are present? E e r i space y s n a l k .
Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1 Building a TreeScan the original text Eerie eyes seen near lake. • What is the frequency of each character in the text?
E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 Building a Tree • The array after inserting all nodes
E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 Building a Tree
Building a Tree y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 i 1 E 1
Building a Tree 2 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 E 1 i 1
Building a Tree 2 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 E 1 i 1 2 y 1 l 1
Building a Tree 2 2 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 y 1 l 1 E 1 i 1
Building a Tree 2 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 2 k 1 . 1
Building a Tree 2 r 2 s 2 n 2 a 2 sp 4 e 8 2 2 k 1 . 1 y 1 l 1 E 1 i 1
Building a Tree n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2
Building a Tree n 2 a 2 e 8 2 sp 4 2 4 2 k 1 . 1 r 2 s 2 E 1 i 1 y 1 l 1
Building a Tree e 8 4 2 2 2 sp 4 r 2 s 2 y 1 l 1 k 1 . 1 E 1 i 1 4 n 2 a 2
Building a Tree e 8 4 4 2 2 2 sp 4 r 2 s 2 n 2 a 2 y 1 l 1 k 1 . 1 E 1 i 1
Building a Tree e 8 4 4 2 sp 4 r 2 s 2 n 2 a 2 k 1 . 1 4 2 2 E 1 i 1 l 1 y 1
Building a Tree 4 4 4 2 e 8 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 l 1 y 1
Building a Tree 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 l 1 y 1 6 sp 4 2 k 1 . 1
Building a Tree 6 4 4 e 8 4 2 sp 4 2 2 n 2 a 2 r 2 s 2 k 1 . 1 E 1 i 1 l 1 y 1 What is happening to the characters with a low number of occurrences?
Building a Tree 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 l 1 y 1 8 4 4 n 2 a 2 r 2 s 2
Building a Tree 4 6 8 e 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 l 1 y 1 n 2 a 2 r 2 s 2
Building a Tree 8 e 8 4 4 10 n 2 a 2 r 2 s 2 4 6 2 2 2 sp 4 E 1 i 1 l 1 y 1 k 1 . 1
Building a Tree 8 10 e 8 4 4 4 6 2 2 2 n 2 a 2 r 2 s 2 sp 4 E 1 i 1 l 1 y 1 k 1 . 1
Building a Tree 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 l 1 y 1 k 1 . 1 4 4 n 2 a 2 r 2 s 2
Building a Tree 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 l 1 y 1 k 1 . 1 n 2 a 2 r 2 s 2
Building a Tree 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 y 1 k 1 . 1 n 2 a 2 r 2 s 2
Building a Tree After enqueueing this node there is only one node left in priority queue. 26 16 10 4 e 8 8 6 2 2 2 4 4 sp 4 E 1 i 1 l 1 y 1 k 1 . 1 n 2 a 2 r 2 s 2
Using heap: P P P P P P L L L L L L R R R R R R c b d f e a 12 16 9 5 45 13
Using heap: P P P P P P L L L L L L R R R R R R a c b e f d 16 12 9 45 5 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P P P P P P L L L L L L R R R R R R a c b e f d 16 12 9 45 5 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P P P P P P L L L L L L R R R R R R e c b a f d 16 12 45 9 5 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P P P P P P L L L L L L R R R R R R e c a b f d 16 12 13 9 5 45 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P P P P P P L L L L L L R R R R R R e c a b f d 16 12 13 9 5 45 CS3335 Design and Analysis of Algorithms/WANG Lusheng
P P g g P P L L L L L L R R R R R R e a d c b f 45 16 5 9 12 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng
Using heap: P g P P g P P L f L L L L L R e R R R R R d c b f g a e 5 9 14 12 45 16 13 CS3335 Design and Analysis of Algorithms/WANG Lusheng