110 likes | 260 Views
Analysis of Algorithms Chapter - 08 Data Compression. This Chapter Contains the following Topics: Why Data Compression? Lossless and Lossy Compression Fixed-Length Coding Variable-Length Coding 5. Huffman Coding. Why Data Compression?. What is data compression?
E N D
Analysis of Algorithms Chapter - 08 Data Compression
This Chapter Contains the following Topics: • Why Data Compression? • Lossless and Lossy Compression • Fixed-Length Coding • Variable-Length Coding 5. Huffman Coding
Why Data Compression? • What is data compression? • Transformation of data into a more compact form. • Transfer rate of compressed data is more than the uncompressed data. • Why compress data? • Saves storage space. • Saves transmission time over a network. • Examples: • Suppose ASCII code of a character is 1 byte. • Suppose we have a text file containing one hundred instances of ‘a’. • So, file size would be about 100 bytes. • Let us store this as “100a” in a new file to convey the same information • New file size would be 4 bytes • 4/100 96% saving
Lossless and Lossy Data Compression • Last example shows “lossless” compression. • Can retrieve original data by decompression. • Lossless compression used when data integrity is important. • Example software: • winzip, gzip, compress etc. • “Lossy” means original not retrievable. • Reduces size by permanently eliminating certain information. • When uncompressed, only a part of the original information is there (but the user may not notice it) • When can we use lossy compression? • For audio, images, video. • jpeg, mpeg etc. are example softwares.
Fixed- Length Coding • Coding: • Way to represent information • Two ways: • Fixed-Length and Variable-Length Coding. • The code for a character is a “codeword”. • We consider binary codes, each character represented by a unique binary codeword. • Fixed-length coding • Length of codeword of each character same • E.g., ASCII, Unicode etc. • Suppose there are n characters • What is the minimum number of bits needed for fixed-length coding? • log2 n • Example: • {a, b, c, d, e}; 5 characters • log2 5 = 2.3… = 3 bits per character • We can have codewords: a=000, b=001, c=010, d=011, e=100.
Variable-Length Coding • Length of codewords may differ from character to character. • Frequent characters get short codewords. • Infrequent ones get long codewords. • Example: • Make sure that a codeword does not occur as the prefix of another codeword • What we need is a “prefix-free code”. • Last example is a prefix-free code • Prefix-free codes give unique decoding • E.g., “001011101” is decoded as “aabe” based on the table in last example • Huffman coding algorithm shows how to obtain prefix-free codes.
Huffman Coding Algorithm • Huffman invented a greedy method to construct an optimal prefix-free variable-length code • Code based on frequency of occurrence • Optimal code given by a full binary tree • Every internal node has 2 children • If |C| is the size of alphabet, , there are |C| leaves and |C|-1 internal nodes • We build the tree bottom-up • Begin with |C| leaves • Perform |C|-1 “merging” operations • Let f [c] denote frequency of character c • We use a priority queue Q in which high priority means low frequency • GetMin(Q) removes element with the lowest frequency and returns it
An Algorithm Input: Alphabet C and frequencies f [ ] Result: Optimal coding tree for C Algorithm Huffman(C, f) { n := |C|; Q := C; for i := 1 ton-1 do { z := NewNode( ); x := z.left := GetMin(Q); y :=z.right := GetMin(Q); f [z] :=f [x] + f [y]; Insert(Q, z); } return GetMin(Q); } • Running time is O(n lg n)
Example • Obtain the optimal coding for the following using the Huffman Algorithm