210 likes | 238 Views
Huffman Encoding. Veronica Morales. Background. Introduced by David Huffman in 1952 Method for encoding data by compression Compressions between 20%-90% Variable-length encoding scheme Used in digital imaging and video. Fixed-length vs. Variable-length Encoding. Fixed-length
E N D
Huffman Encoding Veronica Morales
Background • Introduced by David Huffman in 1952 • Method for encoding data by compression • Compressions between 20%-90% • Variable-length encoding scheme • Used in digital imaging and video
Fixed-length vs. Variable-length Encoding • Fixed-length • Every character code is composed of a “fixed” number of bits, i.e., ASCII code is fixed-length. The ASCII standard uses 7 bits per character • Variable-length • Character code lengths vary. • Huffman encoding uses shorter bit patterns for more common characters, and longer bit patterns for less common characters.
How does it work?The “greedy” approach • Relies on frequency of occurrence (probability) of each character to build up an optimal encoding. • Each character and its frequency is placed on a leaf of a full tree. The two nodes with the smallest frequencies are added and the sum becomes the frequency of the parent node. This process repeats until the root node of the tree is the sum of all leaves.
Encode: Ileana Streinu Create a table with all characters and their probabilities
…until all nodes are accounted for and we have a main root. The tree is full because every parent has two children. To encode, start from root and as you head down to target letter, use 0 for a left turn and 1 for right turn.
Final tree representation of coding map for “Ileana Streinu”
EXAMPLE 1101110110111001000111 ?
1101110110111001000111 110 – E 111 – A 011 – T 011 – T 1001 – U 000 – N 111 – A
What’s the benefit? Huffman encoding done with 22 bits 1101110110111001000111 ASCII coding done with 49 bits 1000101 1000001 1010100 1010100 1010101 1001110 1000001 47% savings in space
Complexity • Assume n items • Build a priority Queue (using the Build-Heap procedure) to identify the two least-frequent objects • O (n)
Build the Huffman Tree • Since we have n leaves, we will perform an ‘merging’ operation of two nodes, |n|-1 times and since every heap operation ,i.e., extract the two minimum nodes and then add a node, is O (log n), we have that Huffman’s algorithm is O (n log n)
Encoding using Huffman Tree • Traverse tree from root to leaf is • O (log n)
Real Life Application of Huffman Codes • GNU gzip Data Compression • Internet standard for data compression • Consists of • short header • a number of compressed “blocks” • an 8 byte trailer
Compressed “Blocks” • Three compressed “blocks”: stored, static, dynamic. • Static and Dynamic blocks use an alphabet that is encoded using Huffman Encoding • http://www.daylight.com/meetings/mug2000/Sayle/gzip.html