190 likes | 224 Views
CSE 326 Huffman coding. Richard Anderson. Coding theory. Code examples 000,001,010,011,100,101 1,01,001,0001,00001,000001 00,010,011,100,11,101. Conversion, Encryption, Compression Binary coding Variable length coding. Decode the following. 11010010010101011. 100100101010.
E N D
CSE 326Huffman coding Richard Anderson
Coding theory Code examples 000,001,010,011,100,101 1,01,001,0001,00001,000001 00,010,011,100,11,101 • Conversion, Encryption, Compression • Binary coding • Variable length coding
Decode the following 11010010010101011 100100101010 Prefix code Ambiguous
Prefix code • No prefix of a codeword is a codeword • Uniquely decodable
Prefix codes and binary trees • Tree representation of prefix codes
Minimum length code • Average cost • Average leaf depth • Huffman tree – tree with minimum weighted path length • C(T) – weighted path length
Huffman code algorithm • Derivation • Two rarest items will have the longest codewords • Codewords for rarest items differ only in the last bit • Idea: suppose the weights are with and the smallest weights • Start with an optimal code for and • Extend the codeword for to get codewords for and
Huffman code H = new Heap() for each wi T = new Tree(wi) H.Insert(T) while H.Size() > 1 T1 = H.DeleteMin() T2 = H.DeleteMin() T3 = Merge(T1, T2) H.Insert(T3)
Example:Weights 4, 5, 6, 7, 11, 14, 21 21 14 11 6 7 4 5
Draw a Huffman tree for the following data values and show internal weights:3, 5, 9, 14, 16, 35
Correctness proof • The most amazing induction proof • Induction on the number of code words • The Huffman algorithm finds an optimal code for n = 1 • Suppose that the Huffman algorithm finds an optimal code for codes size n, now consider a code of size n + 1 . . .
Key lemma • Given a tree T, we can find a tree T’, with the two minimum cost leaves as siblings, and C(T’) <= C(T)
Modify the following tree to reduce the WPL 29 10 19 6 4 13 6 10 3 5 5
Finish the induction proof • T – Tree constructed by Huffman • X – Any code tree • Show C(T) <= C(X) • T’ and X’ – Trees from the lemma • C(T’) = C(T) • C(X’) <= C(X) • T’’ and X’’ – Trees with minimum cost leaves x and y removed
X : Any tree, X’: – modified, X’’ : Two smallest leaves removed • C(X’’) = C(X’) – x – y • C(T’’) = C(T’) – x – y • C(T’’) <= C(X’’) • C(T) = C(T’) = C(T’’) + x + y <= C(X’’) + x + y = C(X’) <= C(X)