660 likes | 771 Views
Data Compression. Terminology. Physical versus logical Physical Performed on data regardless of what information it contains Translates a series of bits to another series of bits Logical Knowledge-based Change United Kingdom to UK. Terminology. Symmetric
E N D
Terminology • Physical versus logical • Physical • Performed on data regardless of what information it contains • Translates a series of bits to another series of bits • Logical • Knowledge-based • Change United Kingdom to UK Data Compression
Terminology • Symmetric • Compression and decompression roughly use the same techniques and take just as long • Data transmission which requires compression and decompression on-the-fly will require these types of algorithms Data Compression
Terminology • Asymmetric • Most common is where compression takes a lot more time than decompression • In an image database, each image will be compressed once and decompressed many times • Less common is where decompression takes a lot more time than compression • Creating many backup files which will hardly ever be read Data Compression
Terminology • Non-adaptive • Contain a static dictionary of predefined substrings to encode which are known to occur with high frequency • Adaptive • Dictionary is built from scratch Data Compression
Terminology • Semi-adaptive • In pass 1, an optimal dictionary is constructed • In pass 2, the actual compression occurs Data Compression
Terminology • Lossless • decompress(compress(data)) = data • Lossy • decompress(compress(data)) data • A small change in pixel values may be invisible, however Data Compression
Pixel Packing Data Compression
Run-Length Encoding • Repeating string of characters, called a run, is coded into two bytes • First byte contains the run count, one less than the number of repetitions • Second byte contains the run value, the character being repeated Data Compression
Run-Length Encoding • ‘77777zzzyyyyyyV’ becomes ‘472z5y0V’ • 15 byte string becomes 8 bytes long • Compression ratio of almost 2 to 1 • Some strings become twice as long • ‘7fu5JLY9jhYIujG’ Data Compression
Lempel-Ziv-Welch (LZW) • Lossless • GIF, TIFF, V.42bis modem compression standard, PostScript Level 2 • Substitutional or dictionary-based • Algorithm builds a data dictionary • Code emitted if pattern found in dictionary, while if not already in dictionary, it is added • Not necessary to have dictionary to do decompression Data Compression
Lempel-Ziv-Welch (LZW) • History • 1977 • Abraham Lempel and Jakob Ziv published a paper on a universal data compression algorithm • Called LZ77 • 1978 • Lempel and Ziv formulated an improved, dictionary-based data compression algorithm • Called LZ78 Data Compression
Lempel-Ziv-Welch (LZW) • History • 1981 • While working for Sperry, Lempel and Ziv, with some other researchers filed for a patent for LZ78 • Granted in 1984 • 1984 • While working for Sperry, Terry Welch modified LZ78 • Result was LZW algorithm • Published in IEEE Computer Data Compression
Lempel-Ziv-Welch (LZW) • History • 1985 • Sperry granted a patent for Welch’s modification and for implementation of LZW • 1986 • Sperry and Burroughs merged to form Unisys • Ownership of Sperry patent transferred to Unisys Data Compression
Lempel-Ziv-Welch (LZW) • History • 1987 • CompuServe created GIF file format • Required use of LZW algorithm • Didn’t check patents for LZW • Unisys also didn’t realize GIF used LZW 1988 • Aldus released Revision 5.0 of TIFF file format • Used LZW algorithm • 1990 • Unisys licensed Adobe for use of LZW patent for PostScript Data Compression
Lempel-Ziv-Welch (LZW) • History • 1991 • Unisys licensed Aldus for use of LZW patent in TIFF • 1993 • Unisys became aware the GIF file format used LZW • Negotiations began with CompuServe Data Compression
Lempel-Ziv-Welch (LZW) • History • 1994 • Unisys and CompuServe came to an understanding that LZW algorithm by CompuServe would be licensed for the application of the GIF file format in software used primarily to access the CompuServe Information Service • 1995 • America Online and Prodigy also entered into license agreements with Unisys for LZW Data Compression
Lempel-Ziv-Welch (LZW) • GIF is not in public domain • Some people were suspicious regarding the announcement of CompuServe that it was getting a license from Unisys • In programming community it was known for many years prior to this that GIF used LZW and that LZW was patented by Unisys Data Compression
Lempel-Ziv-Welch (LZW) • Some people were suspicious regarding the announcement of CompuServe that it was getting a license from Unisys • Unisys claimed that CompuServe only found out rather late that this was the case • GIF was becoming an integral part of WWW for exchanging low-resolution graphics Data Compression
Lempel-Ziv-Welch (LZW) • Eventually, Unisys’ LZW patent and licensing agreements held • Unisys reduced license fees after 1995 • Unisys wouldn’t charge anything for inadvertent infringement by GIF software products delivered prior to 1995 • License fees still required for updates delivered after 1995 Data Compression
Lempel-Ziv-Welch (LZW) • Not illegal to own, transmit, or receive GIF files, just to compress or decompress them without a license Data Compression
offset = 0 3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4 Search buffer Lookahead buffer length = 0 Output is (0, 0, code(4)) Lempel-Ziv-Welch (LZW) Data Compression
offset = 7 3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4 Search buffer Lookahead buffer length = 4 Output is (7, 4, code(5)) Lempel-Ziv-Welch (LZW) Data Compression
offset = 3 3 1 2 5 1 3 1 4 1 2 5 1 5 5 1 5 5 1 4 Search buffer Lookahead buffer length = 5 Output is (3, 5, code(4)) Lempel-Ziv-Welch (LZW) Data Compression
JPEG • Joint Photographic Experts Group • 1982 • ISO (International Standard Organization) formed Photographic Experts Group (PEG) • Develop methods of transmitting video, images and text over ISDN (Integrated Services Digital Network) lines Data Compression
JPEG • 1986 • Subgroup of CCITT (International Telegraph and Telephone Consultative Committee) began to look at methods of compressing color and gray-scale data for fax transmission • Methods for this were similar to those being considered by PEG Data Compression
JPEG • 1987 • Two groups combined into JPEG • Most previous compression methods did poor job of compressing continuous-tone image data Data Compression
JPEG • Very few file formats can support 24-bit raster images • GIF only works for 256 colors • LZW doesn’t work well on scanned image data • TIFF and BMP didn’t compress this type of image data very well Data Compression
JPEG • JPEG compresses continuous tone image data with a pixel depth of 6-24 bits with good efficiency • JPEG itself doesn’t define standard file format Data Compression
JPEG • Toolkit of methods with quality-compression trade-off • Lossy • Discards information that human eye cannot easily see • Slight changes in color not perceived well • Slight changes in intensity are well perceived Data Compression
JPEG • Works well with color or gray-scale continuous tone images: photographs, video stills, complex graphics which resemble natural objects • Doesn’t work well for animations, ray tracing, line art, black-and-white documents, and typical vector graphics Data Compression
JPEG • End-user can tune quality of JPEG encoder through use of Q-factor, which ranges from 1-100 • Q-factor = 1 produces smallest, worst quality images • Q-factor = 100 produces largest, best quality images • Optimal value of Q-factor is image dependent Data Compression
JPEG • JPEG introduces artifacts in images containing large areas of a single color • JPEG is slow if implemented in software • Baseline JPEG • Minimal subset of JPEG which all JPEG-aware applications are required to support Data Compression
JPEG Data Compression
JPEG • Color transform • Encodes each component in a color model separately • Is independent of any color space model Data Compression
JPEG • Color transform • Best compression ratios result if a luminance (gray scale)/chrominance (color) color space, such as YUV, is used • Human eyes more sensitive to luminance information (Y) than to chrominance information (U, V) • The other models spread human sensitive information across each of their 3 components Data Compression
JPEG • Down-sampling • Average groups of pixels together • To exploit human’s lesser sensitivity to chrominance information, we use fewer pixels for the chrominance channels • In an image of 1000 1000 pixels, we might use 1000 1000 luminance pixels, but only 500 500 chrominance pixels • Each chrominance pixel covers the same area as a 2 2 block of luminance pixels Data Compression
JPEG • Down-sampling • For each 2 2 block, we can store 6 pixel values 4 luminance values and 2 chrominance values [1 for each of 2 channels] instead of 12 4 pixel values for each of 3 channels • This 50% reduction in data has almost no perceivable effect Data Compression
JPEG • Discrete cosine transform • For each color channel, the image data is divided into 8 8 blocks • DCT applied to each block • Low-order, or DC, term represents average value in the block • Successive higher-order, or AC, terms represent the strength of more rapid changes across the block Data Compression
JPEG • Discrete cosine transform • Can discard high-frequency data • DCT is lossless except for roundoff errors • DCT is most costly step in JPEG Data Compression
JPEG • Scan-order of each 8 8 block of pixels for DCT Data Compression
JPEG • An 8 8 block from an 8 bit image Data Compression
AC coefficients DC coefficient JPEG • The DCT coefficients corresponding to the previous 8 8 block Data Compression
JPEG • Quantization • Divide DCT output by a quantization coefficient and round result to integer • The larger the coefficient, the more data is lost • Each of the 64 positions of the DCT output block has its own coefficient • Higher order terms have a larger coefficient • Different coefficients for luminance and chrominance channels Data Compression
JPEG • Quantization • This is the step controlled by the quality-factor • Selecting quantization coefficients is an art Data Compression
JPEG • Sample quantization table • Coefficients based on human perception Data Compression
JPEG • Labels • Label labij corresponding to the quantized value of the transform coefficient cij is where Qij is the (i,j)th element of the quantization table Data Compression
JPEG • Quantizer labels corresponding to the previous 8 8 block Data Compression
Encoding • Huffman compress resulting coefficients • Can use arithmetic coding as well Data Compression