1 / 14

Huffman Codes

Huffman Codes. Message consisting of five characters: a, b, c, d,e Probabilities: .12, .4, .15, .08, .25 Encode each character into sequence of 0 ’ s and 1 ’ s so that no code for a character is the prefix of the code for any other character Prefix property

ricky
Download Presentation

Huffman Codes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Huffman Codes • Message consisting of five characters: a, b, c, d,e • Probabilities: .12, .4, .15, .08, .25 • Encode each character into sequence of 0’s and 1’s so that no code for a character is the prefix of the code for any other character • Prefix property • Can decode a string of 0’s and 1’s by repeatedly deleting prefixes of the string that are codes for the character

  2. Example • Both codes have prefix property • Decode code 1: “grab” 3 bits at a time and translate each group into a character • Ex.: 001010011  bcd Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10

  3. Example Cont’d • Decode code 2: Repeatedly “grab” prefixes that are codes for characters and remove them from input • Only difference, cannot “slice” up input at once • How many bits depends on encoded character • Ex.: 1101001  bcd Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10

  4. Big Deal? • Huffman coding results in shorter average length of compressed (encoded) message • Code 1 has average length of 3 • multiply length of code for each symbol by probability of occurrence of that symbol • Code 2 has average length of 2.2 • (3*.12) + (2*.40) + ( 2*.15) + (3*.08) + (2*.25) • Can we do better? • Problem: Given a set of characters and their probabilities, find a code with the prefix property such that the average length of a code for a character is minimum

  5. Representation • Label leaves in tree by characters represented • Think of prefix codes as paths in binary trees • Following a path from a node to its left child as appending a 0 to a code, and proceeding form node to right child as appending 1 • Can represent any prefix code as a binary tree • Prefix property guarantees no character can have a code that is an interior node • Conversely, labeling the leaves of a binary tree with characters gives us a code with prefix property

  6. Sample Binary Trees 0 0 1 1 0 1 0 0 1 0 1 e b 0 c 0 0 1 0 1 1 a d a b c d e Code 1 Code 2

  7. Huffman’s Algorithm • Select two characters a and b having the lowest probabilities and replacing them with a single (imaginary) character, say x • x’s probability of occurrence is the sum of the probabilities for a and b • Now find an optimal prefix code for this smaller set of characters, using the above procedure recursively • Code for original character set is obtained by using the code for x with a 0 appended for a and with a 1 appended for b

  8. Steps in the Construction of a Huffman Tree • Sort input characters by frequency .08 .12 .15 .25 .40 . . . . . d a c e b

  9. Merge a and d .20 .15 .25 .40 . . . . d a c e b

  10. Merge a, d with c .35 .25 .40 . . e b c d a

  11. Merge a, c, d with e .60 .40 . b e c d a

  12. Final Tree 1.00 Codes: a - 1111 b - 0 c - 110 d - 1110 e - 10 average code length: 2.15 0 1 b 0 1 e 0 1 c 0 1 d a

  13. Huffman Algorithm • Example of greedy algorithm • Combine nodes whenever possible without considering potential drawbacks inherent in making such a move • I.e., at any individual stage select that option which is “locally optimal” • Recall: vertex coloring problem • Does not always yield optimal solution; however, Huffman coding is optimal • See textbook for proof

  14. Finishing Remarks • Works well in theory, several restrictive assumptions (1) Frequency of letters is independent of the context of that letter in message • Not true in English language (2) Huffman coding works better when large variation in frequency of letters • Actual frequencies must match expected ones • Examples: DEED  8 bits (12 bits ASCII) FUZZ  20 bits (12 bits ASCII)

More Related