110 likes | 259 Views
Compression. Data Compression. The amount of data we deal with is getting larger Not only do larger files require more disk space, they take longer to transmit Many times files are compressed to save space or for faster transmission. Run Length Encoding.
E N D
Data Compression • The amount of data we deal with is getting larger • Not only do larger files require more disk space, they take longer to transmit • Many times files are compressed to save space or for faster transmission
Run Length Encoding • The simplest type of redundancy in a file is long runs of repeated characters • AAAABBBAABBBBBCCCCCCCC • This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count • 4A3B2A5B8C • For binary files a refined version of this method can yield dramatic savings
Variable Length Encoding • Suppose we wish to encode • ABRACADABRA • Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3? • A = 000 000 001 100 000 010 000 011 000 001 100 000 • B = 001 • C = 010 • D = 011 • R = 100
We Can Do Better • Why use the same number of bits for each letter? • A = 0 0 1 11 0 01 0 10 0 1 11 0 • B = 1 • C = 01 • D = 10 • R = 11 • This is not really a code because it depends on the blanks • 011100101001110
Lets Use a Different Code • A slightly different code • A = 1 • B = 010 • C = 000 • D = 001 • R = 011 • Can you decode this without the blanks? • 0001010
Lets Re-order • A slightly different code • A = 1 • C = 000 • D= 001 • B = 010 • R = 011 • Why can you decode without having the blanks?
Combining Bits • A (5) = 1 • C (1) = 000 • D (1) = 001 • B (2) = 010 • R (2) = 011 • What do you notice about the number of bits used to represent each character? 1 0 A 0 1 0 1 0 1 C D B R
Huffman Coding • The general method for finding this code was developed by D. Huffman in 1952 • Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code • The most common source symbols using shorter strings of bits than are used for less common source symbols • Used in many compression programs
How Does It Work? • Start with your text • GO GO TIGERS • Build a frequency table
Build a Tree • Create a tree using two of the characters that appear least often • Merge them in the table • Repeat until everything is merged