90 likes | 116 Views
Learn about Run Length Encoding, Variable Length Encoding, Huffman Coding, and more ways to compress data efficiently and save disk space. Understand the principles and benefits of different compression methods for optimizing storage usage.
E N D
File Compression • Even though disks have gotten bigger, we are still running short on disk space • A common technique is to compress files so that they take up less space on the disk • We can save space by taking advantage of the fact that most files have a relatively low “information content” Compression
Run Length Encoding • The simplest type of redundancy in a file is long runs of repeated characters • AAAABBBAABBBBBCCCCCCCC • This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count • 4A3B2A5B8C • For binary files a refined version of this method can yield dramatic savings Compression
Variable Length Encoding • Suppose we wish to encode • ABRACADABRA • Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3? • A = 000 000001100000010000011000001100000 • B = 001 • C = 010 • D = 011 • R = 100 Compression
We can do better!! • Why use the same number of bits for each letter • A = 0 0 1 11 0 01 0 10 0 1 11 0 • B = 1 • C = 01 • D = 10 • R = 11 • This is not really a code because it depends on the blanks • 011100101001110 Compression
Consider this Tree B D A C R Compression
More Formally • Start with a frequency table Compression
More Formally • Create a binary tree out of the two elements with the lowest frequencies • New frequency is the sum of the frequencies • Add new node to the frequency table 2 C, 1 D, 1 Compression
More Formally • Repeat until only one element is left in the table 11 6 A,5 2 4 C, 1 D, 1 B,2 R, 2 Compression
Huffman Coding • The general method for finding this code was developed by D. Huffman in 1952 Compression