Huffman Codes

Huffman Codes • Message consisting of five characters: a, b, c, d,e • Probabilities: .12, .4, .15, .08, .25 • Encode each character into sequence of 0’s and 1’s so that no code for a character is the prefix of the code for any other character • Prefix property • Can decode a string of 0’s and 1’s by repeatedly deleting prefixes of the string that are codes for the character

Example • Both codes have prefix property • Decode code 1: “grab” 3 bits at a time and translate each group into a character • Ex.: 001010011  bcd Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10

Example Cont’d • Decode code 2: Repeatedly “grab” prefixes that are codes for characters and remove them from input • Only difference, cannot “slice” up input at once • How many bits depends on encoded character • Ex.: 1101001  bcd Symbol Probability Code 1 Code 2 a b c d e .12 .40 .15 .08 .25 000 001 010 011 100 000 11 01 001 10

Big Deal? • Huffman coding results in shorter average length of compressed (encoded) message • Code 1 has average length of 3 • multiply length of code for each symbol by probability of occurrence of that symbol • Code 2 has average length of 2.2 • (3*.12) + (2*.40) + ( 2*.15) + (3*.08) + (2*.25) • Can we do better? • Problem: Given a set of characters and their probabilities, find a code with the prefix property such that the average length of a code for a character is minimum

Representation • Label leaves in tree by characters represented • Think of prefix codes as paths in binary trees • Following a path from a node to its left child as appending a 0 to a code, and proceeding form node to right child as appending 1 • Can represent any prefix code as a binary tree • Prefix property guarantees no character can have a code that is an interior node • Conversely, labeling the leaves of a binary tree with characters gives us a code with prefix property

Sample Binary Trees 0 0 1 1 0 1 0 0 1 0 1 e b 0 c 0 0 1 0 1 1 a d a b c d e Code 1 Code 2

Huffman’s Algorithm • Select two characters a and b having the lowest probabilities and replacing them with a single (imaginary) character, say x • x’s probability of occurrence is the sum of the probabilities for a and b • Now find an optimal prefix code for this smaller set of characters, using the above procedure recursively • Code for original character set is obtained by using the code for x with a 0 appended for a and with a 1 appended for b

Steps in the Construction of a Huffman Tree • Sort input characters by frequency .08 .12 .15 .25 .40 . . . . . d a c e b

Merge a and d .20 .15 .25 .40 . . . . d a c e b

Merge a, d with c .35 .25 .40 . . e b c d a

Merge a, c, d with e .60 .40 . b e c d a

Final Tree 1.00 Codes: a - 1111 b - 0 c - 110 d - 1110 e - 10 average code length: 2.15 0 1 b 0 1 e 0 1 c 0 1 d a

Huffman Algorithm • Example of greedy algorithm • Combine nodes whenever possible without considering potential drawbacks inherent in making such a move • I.e., at any individual stage select that option which is “locally optimal” • Recall: vertex coloring problem • Does not always yield optimal solution; however, Huffman coding is optimal • See textbook for proof

Finishing Remarks • Works well in theory, several restrictive assumptions (1) Frequency of letters is independent of the context of that letter in message • Not true in English language (2) Huffman coding works better when large variation in frequency of letters • Actual frequencies must match expected ones • Examples: DEED  8 bits (12 bits ASCII) FUZZ  20 bits (12 bits ASCII)

Huffman Codes

Huffman Codes

Presentation Transcript

Synchronization of Huffman codes

Lossless Decomposition and Huffman Codes

Huffman Codes

Huffman Codes

Huffman Codes

Huffman Codes

Huffman codes

Huffman Encoding

Huffman Codes

Huffman Codes and Asssociation Rules (II)

Compression & Huffman Codes

Huffman Codes

4.8 Huffman Codes

Huffman Codes

4.8 Huffman Codes

Huffman Codes

CMSC 100 Storing Data: Huffman Codes and Image Representation

Compression & Huffman Codes

Compression & Huffman Codes

CMSC 100 Storing Data: Huffman Codes and Image Representation

Huffman Codes

4.8 Huffman Codes

Huffman Codes

Huffman Codes

Presentation Transcript

Synchronization of Huffman codes

Lossless Decomposition and Huffman Codes

Huffman Codes

Huffman Codes

Huffman Codes

Huffman Codes

Huffman codes

Huffman Encoding

Huffman Codes

Huffman Codes and Asssociation Rules (II)

Compression &amp; Huffman Codes

Huffman Codes

4.8 Huffman Codes

Huffman Codes

4.8 Huffman Codes

Huffman Codes

CMSC 100 Storing Data: Huffman Codes and Image Representation

Compression &amp; Huffman Codes

Compression &amp; Huffman Codes

CMSC 100 Storing Data: Huffman Codes and Image Representation

Huffman Codes

4.8 Huffman Codes

Compression & Huffman Codes

Compression & Huffman Codes

Compression & Huffman Codes