1 / 11

Analysis of Algorithms Chapter - 08 Data Compression

Analysis of Algorithms Chapter - 08 Data Compression. This Chapter Contains the following Topics: Why Data Compression? Lossless and Lossy Compression Fixed-Length Coding Variable-Length Coding 5. Huffman Coding. Why Data Compression?. What is data compression?

inigo
Download Presentation

Analysis of Algorithms Chapter - 08 Data Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Algorithms Chapter - 08 Data Compression

  2. This Chapter Contains the following Topics: • Why Data Compression? • Lossless and Lossy Compression • Fixed-Length Coding • Variable-Length Coding 5. Huffman Coding

  3. Why Data Compression? • What is data compression? • Transformation of data into a more compact form. • Transfer rate of compressed data is more than the uncompressed data. • Why compress data? • Saves storage space. • Saves transmission time over a network. • Examples: • Suppose ASCII code of a character is 1 byte. • Suppose we have a text file containing one hundred instances of ‘a’. • So, file size would be about 100 bytes. • Let us store this as “100a” in a new file to convey the same information • New file size would be 4 bytes • 4/100  96% saving

  4. Lossless and Lossy Data Compression • Last example shows “lossless” compression. • Can retrieve original data by decompression. • Lossless compression used when data integrity is important. • Example software: • winzip, gzip, compress etc. • “Lossy” means original not retrievable. • Reduces size by permanently eliminating certain information. • When uncompressed, only a part of the original information is there (but the user may not notice it) • When can we use lossy compression? • For audio, images, video. • jpeg, mpeg etc. are example softwares.

  5. Fixed- Length Coding • Coding: • Way to represent information • Two ways: • Fixed-Length and Variable-Length Coding. • The code for a character is a “codeword”. • We consider binary codes, each character represented by a unique binary codeword. • Fixed-length coding • Length of codeword of each character same • E.g., ASCII, Unicode etc. • Suppose there are n characters • What is the minimum number of bits needed for fixed-length coding? • log2 n • Example: • {a, b, c, d, e}; 5 characters • log2 5 = 2.3… = 3 bits per character • We can have codewords: a=000, b=001, c=010, d=011, e=100.

  6. Variable-Length Coding • Length of codewords may differ from character to character. • Frequent characters get short codewords. • Infrequent ones get long codewords. • Example: • Make sure that a codeword does not occur as the prefix of another codeword • What we need is a “prefix-free code”. • Last example is a prefix-free code • Prefix-free codes give unique decoding • E.g., “001011101” is decoded as “aabe” based on the table in last example • Huffman coding algorithm shows how to obtain prefix-free codes.

  7. Huffman Coding Algorithm • Huffman invented a greedy method to construct an optimal prefix-free variable-length code • Code based on frequency of occurrence • Optimal code given by a full binary tree • Every internal node has 2 children • If |C| is the size of alphabet, , there are |C| leaves and |C|-1 internal nodes • We build the tree bottom-up • Begin with |C| leaves • Perform |C|-1 “merging” operations • Let f [c] denote frequency of character c • We use a priority queue Q in which high priority means low frequency • GetMin(Q) removes element with the lowest frequency and returns it

  8. An Algorithm Input: Alphabet C and frequencies f [ ] Result: Optimal coding tree for C Algorithm Huffman(C, f) { n := |C|; Q := C; for i := 1 ton-1 do { z := NewNode( ); x := z.left := GetMin(Q); y :=z.right := GetMin(Q); f [z] :=f [x] + f [y]; Insert(Q, z); } return GetMin(Q); } • Running time is O(n lg n)

  9. Example • Obtain the optimal coding for the following using the Huffman Algorithm

  10. Example (Contd.)

  11. End of Chapter - 07

More Related