260 likes | 448 Views
Compression .. Magic or Science. Only works when a MORE EFFICIENT means of encoding can be foundSpecial assumptions must be made about the data in many cases in order to gain compression benefitsCompression" can lead to larger files if the data does not conform to assumptions. Why compress?. In f
E N D
1. Computer Science 335 Data Compression
2. Compression .. Magic or Science Only works when a MORE EFFICIENT means of encoding can be found
Special assumptions must be made about the data in many cases in order to gain compression benefits
“Compression” can lead to larger files if the data does not conform to assumptions
3. Why compress? In files on a disk
save disk space
In internet access
reduce wait time
In a general queueing system
keep paybacks can be more than linear if operation is nearing or in saturation
4. A typical queueing graph
5. Example Ascii characters require 7 bits
Data may not use ALL characters in Ascii set
consider just digits 0..9
Only 10 values -> really only requires 4 bits
There is actually a well used code for this which also allows for +/- -> BCD
6. Other Approaches
7. Run length encoding Preface each run with a 8-bit length byte
aaabbabbbccdddaaaa -> 18 bytes
3a2b1a3b2c3d4a -> 14 bytes
benefit from runs of 3 or more
aaa versus 3a
No gain or loss
aa versus 2a
lose in single characters
a versus 1a
8. Facsimile Compression (example of run-length encoding)
Example of application of run-length encoding.
Decomposed into black/white pixels
Lots of long runs of black and white pixels
Don’t encode each pixel but runs of pixels
9. Differential encoding
values between 1000 and 1050
1050 requires 11 bits
difference plus +/- requires 7 bits
6 bits -> 64
1 additional bit for direction (+/-)
Differential encoding can lead to problems as each value is relative to the last value.
Like directions, one wrong turn and everything else is irrelevant.
10. Frequency Based Encoding Huffman
Encoding is not the same length for all values
Short codes for frequently occurring symbols
Longer codes for infrequently occurring
Arithmetic (not responsible for this)
Interpret a string as a real number
Infinite number of values between 0 and 1
Divide region up based on frequency
A ->12% and B 5%, A is 0 to 0.12 and B 0.12 to 0.17
Limit based on the fact that computer has limited precision
11. Huffman(more details)
12. Huffman encoding Must know distribution of symbols
Symbols typically have DIFFERENT lengths unlike most schemes you have seen (Ascii, etc)
Characters occurring most have shortest code
Characters occurring least have longest
Solution minimal but not unique
13. Assume following data
14. Lets peek at the answer
15. Build the solution treeChoose the smallest two at a time and group
16. And the binary encoding..
17. Compute expected length
18. Is it hard to interpret a message?
19. Observations of Huffman Method creates a shorter code
Assumes knowledge of symbol distribution
Different symbols .. Different length
Knowing distribution ahead of time is not always possible!
Another version of Huffman coding can solve that problem
20. Revisiting Facsimiles Huffman says one can minimize by assigning different length codes to symbols
Fax transmissions can use this principle to give short messages to long runs of white/black pixels/
Run-length combined with Huffman
See Table 5.7 in the text
21. Table 5.7
22. Multimedia compression
23. Image compression Represented as RGB
8 bits typical for each color
Or as Luminance (brightness 8 bits) and Chrominance (color 16 bits)
Perception of color by humans reacts significantly to light in addition to color
Really two ways to represent the same thing
24. JPEG
25. JPEG algorithm
26. MPEG Uses differential encoding to compare successive frames of a motion picture.
Three kinds of frames:
I -> JPEG complete image
P -> incremental change to I (where block moves)
˝ size I
B -> use a different interpolation technique
Ľ size I
Typical sequence -> I B B P B B I ….
27. MP3 Music/audio compression
Uses psychoacoustic principles
Some sounds can’t be heard because they are drowned by other louder sounds (freqs)
Divide the sound into smaller subbands
Eliminate sounds you can’t hear anyway because others are too loud.
3 types with varying compression
Layer 1 4:1 192K
Layer 2 8:1 128K
Layer 3 12:1 64K