150 likes | 435 Views
Huffman Codes. Message consisting of five characters: a, b, c, d,e Probabilities: .12, .4, .15, .08, .25 Encode each character into sequence of 0 ’ s and 1 ’ s so that no code for a character is the prefix of the code for any other character Prefix property
E N D
Huffman Codes • Message consisting of five characters: a, b, c, d,e • Probabilities: .12, .4, .15, .08, .25 • Encode each character into sequence of 0’s and 1’s so that no code for a character is the prefix of the code for any other character • Prefix property • Can decode a string of 0’s and 1’s by repeatedly deleting prefixes of the string that are codes for the character
Example • Both codes have prefix property • Decode code 1: “grab” 3 bits at a time and translate each group into a character • Ex.: 001010011 bcd Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10
Example Cont’d • Decode code 2: Repeatedly “grab” prefixes that are codes for characters and remove them from input • Only difference, cannot “slice” up input at once • How many bits depends on encoded character • Ex.: 1101001 bcd Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10
Big Deal? • Huffman coding results in shorter average length of compressed (encoded) message • Code 1 has average length of 3 • multiply length of code for each symbol by probability of occurrence of that symbol • Code 2 has average length of 2.2 • (3*.12) + (2*.40) + ( 2*.15) + (3*.08) + (2*.25) • Can we do better? • Problem: Given a set of characters and their probabilities, find a code with the prefix property such that the average length of a code for a character is minimum
Representation • Label leaves in tree by characters represented • Think of prefix codes as paths in binary trees • Following a path from a node to its left child as appending a 0 to a code, and proceeding form node to right child as appending 1 • Can represent any prefix code as a binary tree • Prefix property guarantees no character can have a code that is an interior node • Conversely, labeling the leaves of a binary tree with characters gives us a code with prefix property
Sample Binary Trees 0 0 1 1 0 1 0 0 1 0 1 e b 0 c 0 0 1 0 1 1 a d a b c d e Code 1 Code 2
Huffman’s Algorithm • Select two characters a and b having the lowest probabilities and replacing them with a single (imaginary) character, say x • x’s probability of occurrence is the sum of the probabilities for a and b • Now find an optimal prefix code for this smaller set of characters, using the above procedure recursively • Code for original character set is obtained by using the code for x with a 0 appended for a and with a 1 appended for b
Steps in the Construction of a Huffman Tree • Sort input characters by frequency .08 .12 .15 .25 .40 . . . . . d a c e b
Merge a and d .20 .15 .25 .40 . . . . d a c e b
Merge a, d with c .35 .25 .40 . . e b c d a
Merge a, c, d with e .60 .40 . b e c d a
Final Tree 1.00 Codes: a - 1111 b - 0 c - 110 d - 1110 e - 10 average code length: 2.15 0 1 b 0 1 e 0 1 c 0 1 d a
Huffman Algorithm • Example of greedy algorithm • Combine nodes whenever possible without considering potential drawbacks inherent in making such a move • I.e., at any individual stage select that option which is “locally optimal” • Recall: vertex coloring problem • Does not always yield optimal solution; however, Huffman coding is optimal • See textbook for proof
Finishing Remarks • Works well in theory, several restrictive assumptions (1) Frequency of letters is independent of the context of that letter in message • Not true in English language (2) Huffman coding works better when large variation in frequency of letters • Actual frequencies must match expected ones • Examples: DEED 8 bits (12 bits ASCII) FUZZ 20 bits (12 bits ASCII)