1 / 20

Data Compression Basics & Huffman Coding

Data Compression Basics & Huffman Coding. Motivation of Data Compression. Lossless and Lossy Compression Techniques. Static Lossless Compression: Huffman Coding. Correctness of Huffman Coding : prefix property. Why Data Compression?.

sprice
Download Presentation

Data Compression Basics & Huffman Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Compression Basics & Huffman Coding Motivation of Data Compression. Lossless and Lossy Compression Techniques. Static Lossless Compression: Huffman Coding. Correctness of Huffman Coding : prefix property.

  2. Why Data Compression? • Data storage and transmission cost money. This cost increases with the amount of data available. • This cost can be reduced by processing the data so that it takes less memory and less transmission time. • Data transmission is faster by using better transmission media or by compressing the data. • Data compression algorithms reduce the size of a given data without affecting its content. Examples: . Huffman coding . Run-Length coding . Lempel-Ziv coding

  3. Lossless and Lossy Compression Techniques • Data compression techniques are broadly classified into lossless and lossy. • Lossless techniques enable exact reconstruction of the original document from the compressed information while lossy techniques do not. • Run-length, Huffman and Lempel-Ziv are lossless while JPEG and MPEG are lossy techniques. • Lossy techniques usually achieve higher compression rates than lossless ones but the latter are more accurate.

  4. Lossless and Lossy Compression Techniques (cont'd) • Lempel-Ziv reads variable-sized input and outputs fixed length bits while Huffman coding is the exact opposite. • Lossless techniques are classified into static and adaptive. • In a static scheme, like Huffman coding, the data is first scanned to obtain statistical information before compression begins. • Adaptive models like Lempel-Ziv begin with an initial statistical distribution of the text symbols but modifies this distribution as each character or word is encoded. • Adaptive schemes fit the text more closely but static schemes involve less computations and are faster.

  5. Introduction to Huffman Coding • What is the likelihood that all symbols in a message to be transmitted have the same number of occurrences? • Huffman coding assigns different bits to characters based on their frequency of occurrences in the given message. • The string to be transmitted is first analysed to find the relative frequencies of its constituent characters. • The coding process generates a binary tree, the Huffman code tree, with branches labeled with bits (0 and 1). • The Huffman tree must be sent with the compressed information to enable the receiver decode the message.

  6. Example 1: Huffman Coding Example 1: Information to be transmitted over the internet contains the following characters with their associated frequencies as shown in the following table: .Use Huffman technique to answer the following questions: • Build the Huffman code tree for the message. • Use the Huffman tree to find the codeword for each character. • If the data consists of only these characters, what is the total number of bits to be transmitted? What is the percentage saving if the data is sent with 8-bit ASCII values without compression? • Verify that your computed Huffman codewords are correct.

  7. Example 1: Huffman Coding (Solution) • Solution: The Huffman coding process uses a priority queue and binary trees using the frequencies. • We begin by filling the priority queue with one-node binary trees each containing a frequency count and the symbol with that frequency. • The initial priority queue is built by arranging the one-node binary trees in decreasing order of frequency. • The object with the lowest priority is designated as the front of the queue. • At each step, the priority queue is manipulated as outlined next:

  8. Example 1: Huffman Coding (Solution) • The priority queue is manipulated as follows: • 1. Dequeue two trees from the front of the queue. • 2. Construct a new binary tree from the two trees as follows: • a. Construct a new tree by using the two trees that were dequeued as • the left and right subtrees of the new tree • b. Give the new tree the priority that is the sum of the priorities of its left and right subtrees. • 3. Enqueue the new tree using as its priority the sum of the priorities of the two trees used to construct it. • 4. Continue this process until only one tree is in the priority queue.

  9. Example 1: Huffman Coding Step 1 front l o s n a t e 13 18 22 45 45 53 65

  10. Example 1 Solution (cont'd) front s n a t e 22 31 45 45 53 65 l o

  11. Example 1 Solution (cont'd) • front • n a t e 45 45 53 53 65 • s 31 • l o

  12. Example 1 Solution (cont'd) • front t e • 53 53 65 90 s 31 n a • l o

  13. Example 1 Solution (cont'd) • front • e • 65 90 106 • n a 53 t • s 31 • l o

  14. Example 1 Solution (cont'd) • front • 106 155 • 53 t e 90 • s 31 n a • l o

  15. Example 1 Solution (cont'd) 261 106 155 53 t e 90 s 31 n a l o

  16. Example 1 Solution (cont'd) • 261 • 106 155 • 53 t e 90 • s 31 n a • l o 1 0 1 1 0 0 1 0 0 1 0 1

  17. Example 1 Solution (cont'd) • 261 • 106 155 • 53 t e 90 • s 31 n a • l o 1 0 1 1 0 0 1 0 0 1 0 1

  18. Example 1 Solution (cont'd) • The sequence of zeros and ones that are the arcs in the path from the root to each terminal node are the desired codes: Character a e l n o s t • if we assume the message consists of only the characters a,e,l,n,o,s and t then the number of bits transmitted will be: • 2*65+2*53+3*45+3*45+3*22+4*18+4*13 = 696 bits • If the message is sent uncompressed with 8-bit ASCII • representation for the characters, we have • 261*8 = 2088 bits, i.e. we saved about 70% transmission time.

  19. Example 1 Solution: The Prefix Property • Data encoded using Huffman coding is uniquely decodable. This is because Huffman codes satisfy an important property called the prefix property. • This property guarantees that no codeword is a prefix of another Huffman codeword • For example, 10 and 101 cannot simultaneously be valid Huffman codewords because the first is a prefix of the second. • Thus, any bitstream is uniquely decodable with a given Huffman code. • We can see by inspection that the codewords we generated (shown in the preceding slide) are valid Huffman codewords.

  20. Exercises • Using the Huffman tree constructed in this session, decode the following sequence of bits, if possible. Otherwise, where does the decoding fail? 10100010111010001000010011 • Using the Huffman tree construted in this session, write the bit sequences that encode the messages: test , state , telnet , notes • Mention one disadvantage of a lossless compression scheme and one disadvantage of a lossy compression scheme. • Write a Java program that implements the Huffman coding algorithm.

More Related