Digital Image Processing Lecture 20: Image Compression May 16, 2005

Digital Image Processing Lecture 20: Image CompressionMay 16, 2005 Prof. Charlene Tsai

Before Lecture … • Please return your mid-term exam.

Starting with Information Theory • Data compression: the process of reducing the amount of data required to represent a given quantity of information. • Data information • Data convey the information; various amount of data can be used to represent the same amount of information. • E.g. story telling (Gonzalez pg411) • Data redundancy • Our focus will be coding redundancy

Coding Redundancy • Again, we’re back to gray-level histogram for data (code) reduction • Let rk be a graylevel with occurrence probability pr(rk). • If l(rk) is the # of bits used to represent rk, the average # of bits for each pixel is

Example on Variable-Length Coding • Average for code 1 is 3, and for code 2 is 2.7 • Compression ratio is 1.11 (3/2.7), and level of reduction is

Information Theory • Information theory provides the mathematical framework for data compression • Generation of information modeled as a probabilistic process • A random event E that occurs with probability p(E) contain units of information (self-information)

Some Intuition • I(E) is inversely related to p(E) • If p(E) is 1 => I(E)=0 • No uncertainty is associated with the event, so no information is transferred by E. • Take alphabet “a” and “q” as an example. p(“a”) is high, so, low I(“a”); p(“q”) is low, so high I(“q”). • The base of the logarithm is the unit used to measure the information. • Base 2 is for information in bit

Entropy • Measure of the amount of information • Formal definition: entropy H of an image is the theoretical minimum # of bits/pixel required to encode the image without loss of information where i is the grayscale of an image, and pi is the probability of graylevel i occurring in the image. • No matter what coding scheme is used, it will never use fewer than H bits per pixel

Variable-Length Coding • Lossless compression • Instead of fixed length code, we use variable-length code: • Smaller-length code for more probable gray values • Two methods: • Huffman coding • Arithmetic coding • We’ll go through the first method

Huffman Coding • The most popular technique for removing coding redundancy • Steps: • Determine the probability of each gray value in the image • Form a binary tree by adding probabilities two at a time, always taking the 2 lowest available values • Now assign 0 and 1 arbitrarily to each branch of the tree from the apex • Read the codes from the top down

Example in pg 397 • The average bit per pixel is 2.7 • Much better than 3, originally • Theoretical minimum (entropy) is 2.7 • How to decode the string 11011101111100111110 • Huffman codes are uniquely decodable.

LZW (Lempel-Ziv-Welch) Coding • Lossless Compression • Compression scheme for Gif, TIFF and PDF • For 8-bit grayscale images, the first 256 words are assigned to grayscales 0, 1, …255 • As the encoder scans the image, the grayscale sequences not in the dictionary are place in the next available location. • The encoded output consists of dictionary entries.

Example • Consider the 4x4, 8-bit image of a vertical edge 39 39 126 126 39 39 126 126 39 39 126 126 39 39 126 126 • A 512-word dictionary starts with the content … … … …

To decode, read the 3rd column from top to bottom

Run-Length Encoding (1D) • Lossless compression • To encode strings of 0s and 1s by the number or repetitions in each string. • A standard in fax transmission • There are many versions of RLE

(con’d) • Consider the binary image on the right • Method 1: (123)(231)(0321)(141)(33)(0132) • Method 2: (22)(33)(1361)(24)(43)(1152) • For grayscale image, break up the image first into the bit planes. 0 1 1 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1

Problem with grayscale RLE • Long runs of very similar gray values would result in very good compression rate for the code. • Not the case for 4 bit image consisting of randomly distributed 7s and 8s. • One solution is to use gray codes. • See page 400-401 for an example

Example in pg 400 • For 4 bit image, • Binary encoding: 8 is 1000, and 7 is 0111 • Gray code encoding: 8 is 1100 and 7 is 0100 • Bit planes are: Uncorrelated Highly correlated 0th, 1st, and 2nd binary bit plane 0th and 1st gray code bit plane (replace 0 by 1 for 2nd plane) 3rd binary bit plane 3rd gray code bit plane

Summary • Information theory • Measure of entropy, which is the theoretical minimum # of bits per pixel • Lossless compression schemes • Huffman coding • LZW • Run-Length encoding

Digital Image Processing Lecture 20: Image Compression May 16, 2005