1 / 11

Compression

Compression. Data Compression. The amount of data we deal with is getting larger Not only do larger files require more disk space, they take longer to transmit Many times files are compressed to save space or for faster transmission. Run Length Encoding.

katen
Download Presentation

Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compression

  2. Data Compression • The amount of data we deal with is getting larger • Not only do larger files require more disk space, they take longer to transmit • Many times files are compressed to save space or for faster transmission

  3. Run Length Encoding • The simplest type of redundancy in a file is long runs of repeated characters • AAAABBBAABBBBBCCCCCCCC • This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count • 4A3B2A5B8C • For binary files a refined version of this method can yield dramatic savings

  4. Variable Length Encoding • Suppose we wish to encode • ABRACADABRA • Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3? • A = 000 000 001 100 000 010 000 011 000 001 100 000 • B = 001 • C = 010 • D = 011 • R = 100

  5. We Can Do Better • Why use the same number of bits for each letter? • A = 0 0 1 11 0 01 0 10 0 1 11 0 • B = 1 • C = 01 • D = 10 • R = 11 • This is not really a code because it depends on the blanks • 011100101001110

  6. Lets Use a Different Code • A slightly different code • A = 1 • B = 010 • C = 000 • D = 001 • R = 011 • Can you decode this without the blanks? • 0001010

  7. Lets Re-order • A slightly different code • A = 1 • C = 000 • D= 001 • B = 010 • R = 011 • Why can you decode without having the blanks?

  8. Combining Bits • A (5) = 1 • C (1) = 000 • D (1) = 001 • B (2) = 010 • R (2) = 011 • What do you notice about the number of bits used to represent each character? 1 0 A 0 1 0 1 0 1 C D B R

  9. Huffman Coding • The general method for finding this code was developed by D. Huffman in 1952 • Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code • The most common source symbols using shorter strings of bits than are used for less common source symbols • Used in many compression programs

  10. How Does It Work? • Start with your text • GO GO TIGERS • Build a frequency table

  11. Build a Tree • Create a tree using two of the characters that appear least often • Merge them in the table • Repeat until everything is merged

More Related