260 likes | 271 Views
Huffman compression is a data encoding process that reduces the number of bits needed, saving valuable resources like bandwidth and disk space. This algorithm assigns shorter codes to frequently occurring characters, optimizing compression ratio.
E N D
A Data Compression Algorithm:Huffman Compression Gordon College
Compression • Definition: process of encoding which uses fewer bits • Reason: to save valuable resources such as communication bandwidth or hard disk space Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Compress Uncompress Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa Aaa aa aa aaa aa a a a aa aa aaaa
Compression Types • Lossy • Loses some information during compression which means the exact original can not be recovered (jpeg) • Normally provides better compression • Used when loss is acceptable - image, sound, and video files
Compression Types • Lossless • exact original can be recovered • usually exploit statistical redundancy • Used when loss is not acceptable - data Basic Term: Compression Ratio - ratio of the number of bits in original data to the number of bits in compressed data For example: 3:1 is when the original file was 3000 bytes and the compression file is now only 1000 bytes.
Variable-Length Codes • Recall that ASCII, EBCDIC, and Unicode use same size data structure for all characters • Contrast Morse code • Uses variable-length sequences • The Huffman Compression is a variable-length encoding scheme
Variable-Length Codes • Each character in such a code • Has a weight (probability) and a length • The expected length is the sum of the products of the weights and lengths for all the characters 0.2 x 2 + 0.1 x 4 + 0.1 x 4 + 0.15 x 3 + 0.45 x 1 = 2.1 Goal minimize the expected length
Huffman Compression • Uses prefix codes (sequence of optimal binary codes) • Uses a greedy algorithm - looks at the data at hand and makes a decision based on the data at hand.
Huffman Compression Basic algorithm • Generates a table that contains the frequency of each character in a text. • Using the frequency table - assign each character a “bit code” (a sequence of bits to represent the character) • Write the bit code to the file instead of the character.
Immediate Decodability • Definition: When no sequence of bits that represents a character is a prefix of a longer sequence for another character Purpose: Can be decoded without waiting for remaining bits • Coding scheme to the right is not immediately decodable • However this one is
Huffman Compression • Huffman (1951) • Uses frequencies of symbols in a string to build a variable rate prefix code. • Each symbol is mapped to a binary string. • More frequent symbols have shorter codes. • No code is a prefix of another. Not Huffman Codes
Huffman Codes • We seek codes that are • Immediately decodable • Each character has minimal expected code length • For a set of n characters { C1 .. Cn } with weights { w1 .. wn } • We need an algorithm which generates n bit strings representing the codes
Cost of a Huffman Tree • Let p1, p2, ... , pm be the probabilities for the symbols a1, a2, ... ,am, respectively. • Define the cost of the Huffman tree T to be where ri is the length of the path from the root to ai. • HC(T) is the expected length of the code of a symbol coded by the tree T. HC(T) is the bit rate of the code.
Example of Cost • Example: a 1/2, b 1/8, c 1/8, d 1/4 HC(T) = 1 x 1/2 + 3 x 1/8 + 3 x 1/8 + 2 x 1/4 = 1.75 a b c d
Huffman Tree • Input: Probabilities p1, p2, ... , pm for symbols a1, a2, ... ,am, respectively. • Output: A tree that minimizes the average number of bits (bit rate) to code a symbol. That is, minimizes where ri is the length of the path from the root to ai. This is a Huffman tree or Huffman code.
Recursive Algorithm - Huffman Codes • Initialize list of none-node binary trees containing a weight for each character • Repeat the following n – 1 times: a. Find two trees T' and T" in list with minimal weights w' and w"b. Replace these two trees with a binary tree whose root is w' + w" and whose subtrees are T' and T"and label points to these subtrees 0 and 1
Huffman's Algorithm • The code for character Ci is the bit string labeling a path in the final binary tree from the root to Ci Given characters The end with codes result is
Huffman Decoding Algorithm • Initialize pointer p to root of Huffman tree • While end of message string not reached repeat the following:a. Let xbe next bit in stringb. if x = 0 set p equal to left child pointer else set p to right child pointerc. If p points to leaf i. Display character with that leaf ii. Reset p to root of Huffman tree
Huffman Decoding Algorithm • For message string 0101011010 • Using Hoffman Tree and decoding algorithm Click for answer
Iterative Huffman TreeAlgorithm • Form a node for each symbol ai with weight pi; • Insert the nodes in a min priority queue ordered by probability; • While the priority queue has more than one element do • min1 := delete-min; • min2 := delete-min; • create a new node n; • n.weight := min1.weight + min2.weight; • n.left := min1; also associate this link with bit 0 • n.right := min2; also associate this link with bit 1 • insert(n) • Return the last node in the priority queue.
Example of Huffman TreeAlgorithm (1) • P(a) =.4, P(b)=.1, P(c)=.3, P(d)=.1, P(e)=.1
In class example • I will praise you and I will love you Lord Index Sym Freq 0 space 9 1 I 2 2 L 1 3 a 2 4 d 2 5 e 2 6 i 3 7 l 5 8 n 1 9 o 4 10 p 1 11 r 2 12 s 1 13 u 2 14 v 1 15 w 2 16 y 2
In class example • I will praise you and I will love you Lord Index Sym Freq Parent Left Right Nbits Bits 0 space 9 30 -1 -1 2 01 1 I 2 23 -1 -1 5 11010 2 L 1 17 -1 -1 5 00010 3 a 2 20 -1 -1 5 11110 4 d 2 22 -1 -1 5 11101 5 e 2 21 -1 -1 4 0000 6 i 3 25 -1 -1 4 1100 7 l 5 28 -1 -1 3 101 8 n 1 17 -1 -1 5 00011 9 o 4 26 -1 -1 3 001 10 p 1 18 -1 -1 6 100110 11 r 2 23 -1 -1 5 11011 12 s 1 18 -1 -1 6 100111 13 u 2 24 -1 -1 4 1000 14 v 1 19 -1 -1 5 10010 15 w 2 20 -1 -1 5 11111 16 y 2 22 -1 -1 5 11100