180 likes | 399 Views
Huffman Coding. Yancy Vance Paredes. Outline. Background Motivation Huffman Algorithm Sample Implementation Running Time Analysis Proof of Correctness Application. Background. Lossless compression where around 20% to 90% of savings in space Developed by David A. Huffman
E N D
Huffman Coding Yancy Vance Paredes
Outline • Background • Motivation • Huffman Algorithm • Sample Implementation • Running Time Analysis • Proof of Correctness • Application
Background • Lossless compression where around 20% to 90% of savings in space • Developed by David A. Huffman • Published in 1952
Motivation /1 • Let’s say we want to store the string: go go gophers (13 characters) • How do we usually do it? • ASCII – 7 bits + 1 more bit • 13 * 8 bits = 104 bits • Reduce it? • 8 unique characters: g, o, p, h, e, r, s, space • Instead of 8 bits, we can lower it to 3 bits • 13 * 3 bits = 39 bits • We saved 65 bits!
Motivation /2 • What if we lessen the number of bits for frequent characters?
Motivation /3 • The total number of bits used is lowered to 37 • Prefix Code • Easy to encode and decode • 0001111000111100011001010110010111101
Motivation /4 • How do we decode? • 0 means go LEFT • 1 means go RIGHT • How to decode the following? • 0001111000111100011001010110010111101
How to Decode? 0001111000111100011001010110010111101
Huffman Algorithm • A greedy algorithm • Constructs an optimal prefix code • Huffman code HUFFMAN(C) n = |C| Q = C for i = 1 to n-1 allocate a new node z z.left = x = EXTRACT_MIN(Q) z.right = y = EXTRACT_MIN(Q) z.freq = x.freq + y.freq INSERT(Q,z) return EXTRACT_MIN(Q)
Sample Implementation • See program demo
Running Time Analysis • Assume that Q is implemented as a min heap (priority queue) • Building the Q takes O(n) • The for loop executes n-1 times • The heap operations contribute O(lg n) • Thus, the loop contributes O(n lg n) • Total running time is O(n lg n)!
Proof of Correctness /1 • Show that the problem of determining an optimal prefix code exhibits the following properties: • Greedy choice • Optimal substructure
Proof of Correctness /2 • To compute the cost of a tree:
Proof of Correctness /3 • Greedy choice
Proof of Correctness /4 • Optimal substructure
Application • Commonly used as the back-end of some multimedia codecs • JPEG, MP3
Summary • Background • Motivation • Huffman Algorithm • Sample Implementation • Running Time Analysis • Proof of Correctness • Application
References • Chapter 13: Greedy Algorithm