1 / 30

Compression

Original. 10:1 Compression. 45:1 Compression. Compression. JPG compression, Source: http://www.dspguide.com/datacomp.htm. Content. Introduction Techniques for compression Run-length Lempel-Ziv Huffman Mpeg-4 Conclusion.

Download Presentation

Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Original 10:1 Compression 45:1 Compression Compression JPG compression, Source: http://www.dspguide.com/datacomp.htm

  2. Content • Introduction • Techniques for compression • Run-length • Lempel-Ziv • Huffman • Mpeg-4 • Conclusion

  3. In nature, science, and human affairs, where do we see compression and decompression?

  4. Motivation for Compression Compression is especially important in video, voice and fax applications where very large amounts of data is transmitted. Data compression can increase the throughput considerably. Example If there are 40,000 picture elements (pixels) per square inch. on a 8.5" x 11" page, there are 3,740,000 bits. Using a 56Kbps line, this transmission would take 67 seconds. If the data is compressed by a factor of 10, the transmission time is reduced to 6.7 seconds per page. These days, data compression is commonly used by modems, fax machines, video conferencing equipment, your TIVO, etc.

  5. Device 1 Bottleneck Device 2 Practical applications of data compression • Realize cost savings in design of system: • Examples: • Modems, analog fax, compressed voice for cellular radio. • Digital voice • Compressed video, CD music, iPod • Without compression, these applications would not be feasible.

  6. Principles behind Compression • Types of techniques: • 1. Redundancy reduction: • Remove redundancy from the message. • Usually lossless. • 2. Reduce information content: • Reduce the total amount of information in the message. • Leads to sacrifice of quality. • Usually lossy.

  7. Categories of compression 1. Data compression Used for data files and program files. Lossless. e.g., Winzip, gzip, compress. 2. Audio compression. Compresses digitized voice (e.g. cellular) and music. Lossy for voice, lossless for hi-fi music. e.g. Real Audio. 3. Image compression Removes redundancy within the frame. Different formats. BMP (bitmap file) is lossless but creates large files. GIF and JPEG lossy. 4. Video compression. Removes intra- and inter-frame redundancy. Lossy. Examples: MPEG, Quicktime, Real Video.

  8. Compressibility of different data patterns 0 - CLOUDY DAY 1 - SUNNY DAY 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 SET 1: 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 SET 2: 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 SET 3: 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 SET 4: 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 In which set is the information content the highest? How will you store these patterns of information in the most economical way? SET 5:

  9. Compression Techniques • Common compression techniques • “Seinfeld” method: yada, yada, yada... • Run-length encoding • Lempel-Ziv method • Huffman coding Marcy: Speaking of ex's, my old boyfriend came over late last night, and, yada yada yada, anyway. I'm really tired today.

  10. Spot the difference… That’s it. Image Compressed 48 times while you watched

  11. RUN-LENGTH ENCODING Source: NY Times, June 18, 1998.

  12. RUN-LENGTH ENCODING • Look for sequences of repeating characters • Replace a sequence of repeating characters with a • 3-char code: • special character that indicates suppression • character to be suppressed • frequency (count of number of characters) • Example: • $******55.72 becomes $S*655.72 • GunsbbbbbbbbbButter becomes GunsSb9Butter • What does the efficiency of this method depend on?

  13. Lempel-Ziv

  14. Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. The rain in spain falls mainly on the plain. The rain [3,3]spain falls mainly on the plain. Token [a,b] means: go back acharacters. copy b characters from there.

  15. Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. The rain in spain falls mainly on the plain. The rain [3,3]sp[9,4]falls mainly on the plain. Token [a,b] means: go back acharacters. copy b characters from there.

  16. Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. The rain in spain falls mainly on the plain. The rain [3,3]sp[9,4]falls m[11,3]ly on the plain. Token [a,b] means: go back acharacters. copy b characters from there.

  17. Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. The rain in spain falls mainly on the plain. The rain [3,3]sp[9,4]falls m[11,3]ly on [34,4]plain. Token [a,b] means: go back acharacters. copy b characters from there.

  18. Lempel-Ziv Algorithm This algorithm looks for repetitive sequences of patterns in a message and replaces them with a token which points back to the most recent occurrence. The rain in spain falls mainly on the plain. The rain [3,3]sp[9,4]falls m[11,3]ly on [34,4]pl[15,3]. Token [a,b] means: go back acharacters. copy b characters from there. This message contains 27 characters and 5 tokens. Each token needs 2 bytes. Thus, space required is 37 bytes vs. original of 44 bytes. (Note: Since each token takes two bytes,this replacement is done only if the repeating pattern is more than two bytes long. )

  19. Huffman coding Consider a language with only 4 characters, T, E, L, K. Here is a pattern in this language: T E E E L E E E K E Probability of T = 0.1 Probability of E = 0.7 Probability of L = 0.1 Probability of K = 0.1 If we use 2-bit codes for each character, say, 00 - T; 01- E; 10- L; 11- K, then we need 20 bits to store this pattern. Question: Can we do better? i.e., store the pattern in fewer bits.

  20. HUFFMAN CODING Algorithm

  21. 0 0.1 T 0.2 0 0.1 L 0.3 0 1 0.1 K 1.0 1 0.7 E 1 Codes: T: 000 L: 001 K: 01 E: 1 HUFFMAN CODING EXAMPLE • Treat each character or symbol as leaf node in a tree (ordered by probability and occurrence) • Merge two lowest probability nodes into a node whose probability is the sum of the two merged nodes. • Repeat this process until no unmerged nodes remain. The final node is the root of a tree. • Label each pair of branches starting from root with 0 and 1 • The code word for a symbol is the string of labels from the root node to the original symbol.

  22. K E E E K L T E E E K Codes: T: 000 L: 001 K: 01 E: 1 Decoding a Message (start from left) 0 1 1 1 1 0 1 0 0 1 0 0 0 1 1 1 0 1

  23. SAVINGS FROM HUFFMAN CODING Original string had 10 characters, each 2 bits long. Total length = 20 bits Modified String: T once -----> 1 x 3 = 3 bits K once -----> 1 x 3 = 3 bits L once -----> 1 x 2 = 2 bits E 7 times -----> 7 x 1 = 7 bits Total = 15 bits Savings = (20-15) = 25 % 20

  24. Applications and Standards MNP Class 5 is a modem standard which uses run-length encoding. V.42 bis is a newer modem standard for high-speed modems These modems use Lempel-Ziv compression method and can compress by a factor of 3.5 to 4 times. Video standards: H261, JPEG, MPEG-1 (for rates up to 1.5 Mbps), MPEG-2 (for rates up to 40 Mbps). Audio compression standards: ADPCM, LPC (Linear Predictive Coding), MPEG Audio (e.g., MP3) In general, compression ratio depends upon nature of data

  25. MPEG-4 • The “bane” of DVD? • A standard for transmitting video and sound • Meshes existing MPEG-2 inter- and intra-frame advancements with VRML • What about MPEG-7?

  26. MPEG-4

  27. Conclusion Anything can be compressed more… …but can the original form be recreated? Big Bang: The ultimate decompression! Image source: http://www.esa.int/esaKIDSen/SEMSZ5WJD1E_OurUniverse_0.html

More Related