310 likes | 476 Views
ECEC 453 Image Processing Architecture. Lecture 5, 1/22/2004 Rate-Distortion Theory, Quantizers and DCT Oleh Tretiak Drexel University. Quality - Rate Tradeoff. Given: 512x512 picture, 8 bits per pixel Bit reduction Fewer bits per pixel Fewer pixels Both Issues:
E N D
ECEC 453Image Processing Architecture Lecture 5, 1/22/2004 Rate-Distortion Theory, Quantizers and DCT Oleh Tretiak Drexel University
Quality - Rate Tradeoff • Given: 512x512 picture, 8 bits per pixel • Bit reduction • Fewer bits per pixel • Fewer pixels • Both • Issues: • How do we measure compression? • Bits/pixel — does not work when we change number of pixel • Total bits — valid, but hard to interpret • How do we measure quality? • RMS noise • Peak signal to noise ratio (PSR) in dB • Subjective quality
Quantizer Performance • Questions: • How much error does the quantizer introduce (Distortion)? • How many bits are required for the quantized values (Rate)? • Rate: • 1. No compression. If there are N possible quantizer output values, then it takes ceiling(log2N) bits per sample. • 2(a). Compression. Compute the histogram of the quantizer output. Design Huffman code for the histogram. Find the average lentgth. • 2(b). Find the entropy of the quantizer distribution • 2(c). Preprocess quantizer output, .... • Distortion: Let x be the input to the quantizer, x* the de-quantized value. Quantization noise n = x* - x. Quantization noise power is equal to D = Average(n2).
S S Quantizer: practical lossy encoding • Quantizer • Symbols x — input to quantizer, q — output of quantizer,S — quantizer step • Quantizer: q = round(x/S) • Dequantizer characteristic x* = Sq • Typical noise power added by quantizer-dequantizer combination: D = S2/12 noise standard deviation s = sqrt(D) = 0.287S Example: S = 8, D = 82/12 = 5.3, rms. quatization noise = sqrt(D) = 2.3 If input is 8 bits, max input is 255. There are 255/8 ~ 32 quantizer output values PSNR = 20 log10(255/2.3) = 40.8 dB Quantizer characteristic Dequantizer characteristic
Rate-Distortion Theorem • When long sequences (blocks) are encoded, it is possible to construct a coder-decoder pair that achieves the specified distortion whenever bits per sample are R(D) + e. • Formula: X ~ Gaussian random variable, Q = E[X2] ~ signal power • D = E[(X–Y)2 ] ~ noise power
This Lecture • Decorrelation and Bit Allocation • Discrete Cosine Transform • Video Coding
Coding Correlated Samples • How to code correlated samples • Decorrelate • Code • Methods for decorrelation • Prediction • Transformation • Block transform • Wavelet transform
Prediction Rules • Simplest: previous value
General Predictive Coding • General System • Example of linear predictive image coder
Rate-distortion theory — correlated samples • Given: x = (x1, x2, ... xn), a sequence of Gaussian correlated samples • Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) that de-correlates the samples. This is called a Karhunen-Loeve transformation • Perform lossy encoding of (y1, y2, ... yn) - get y* = (y1*, y2*, ... yn*) after decoding • Reconstruct: x* = A-1y*
Block-Based Coding • Discrete Cosine Transform (DCT) is used instead of the K-L transform • Full image DCT - one set of decorrelated coefficients for whole image • Block-based coding: • Image divided into ‘small’ blocks • Each block is decorrelated separately • Block decorrelation performs almost as well (better?) than full image decorrelation • Current standards (JPEG, MPEG) use 8x8 DCT blocks
Rate-distortion theory: non-uniform random variables • Given (x1, x2, ... xn), use orthogonal transform to obtain (y1, y2, ... yn). • Sequence of independent Gaussian variables (y1, y2, ... yn), Var[yi ] = Qi. • Distortion allocation: allocate Di distortion to Qi • Rate (bits) for i-th variable is Ri = max[0.5 log2(Qi/Di), 0] • Total distortion • Total rate (bits) • We specify R. What are the values of Di to get minimum total distortion D?
Bit allocation solution • Implicit solution (water-filling construction) • Choose Q (parameter) • Di= min(Qi, Q) • If (Qi > Q) then Di = Q, else Di= Qi • Ri = max[0.5 log2(Qi/Di), 0] • If (Qi > Q) then Ri=0.5 log2(Qi/ Q), else Ri= 0. • Find value of Qto get specified R
Wavelet Transform • Filterbank and wavelets • 2 D wavelets • Wavelet Pyramid
50 50 Filterbank and Wavelets • Put signal (sequence) through two filters • Low frequencies • High frequencies • Downsample both by factor of 2 • Do it in such a way that the original signal can be reconstructed! 100 100
Filterbank Pyramid 125 125 250 500 1000
2D Wavelets • Apply wavelet processing along rows of picture Apply wavelet processing along columns of picture Pyramid processing
48.81 9.23 1.01 15.45 6.48 2.52 0.37 Lena: Top Level, next level
Decorrelation of Images • x = (x1, x2, ... xn), a sequence of image gray values • Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) • Theoretical best (for Gaussian process): A is the Karhunen-Loeve transformation matrix • Images are not Gaussian processes • Karhunen-Loeve matrix is image-dependent, computationally expensive to find • Evaluating y = Ax with K-L transformation is computationally expensive • In practice, we use DCT (discrete cosine transform) for decorrelation • Computationally efficient • Almost as good as the K-L transformation
DPCM • Simple to implement (low complexity) • Prediction: 3 multiplications and 2 additions • Estimation: 1 addition • Encoding: 1 addition + quantization • Performance for 2-D coding not as good as block quantization • In theory, for large past history the performance (rate-distortion) should be as good as other linear methods, but in that case there is no computational advantage • Bottom line: useful when complexity is limited • Important idea: Lossy predictive encoding.
Review: Image Decorrelation • x = (x1, x2, ... xn), a sequence of image gray values • Preprocess: convert to y = (y1, y2, ... yn), y = Ax, A ~ an orthogonal matrix (A-1 = AT) • Theoretical best (for Gaussian process): A is the Karhunen-Loeve transformation matrix • Images are not Gaussian processes • Karhunen-Loeve matrix is image-dependent, computationally expensive to find • Evaluating y = Ax with K-L transformation is computationally expensive • In practice, we use DCT (discrete cosine transform) for decorrelation • Computationally efficient • Almost as good as the K-L transformation
Rate-Distortion: 1D vs. 2D coding • Theory on tradeoff between distortion and least number of bits • Interesting tradeoff only if samples are correlated • “Water-filling” construction to compute R(d)
Review: Block-Based Coding • Full image DCT - one set of decorrelated coefficients for whole image • Block-based coding: • Image divided into ‘small’ blocks • Each block is decorrelated separately • Block decorrelation performs almost as well (better?) than full image decorrelation • Current standards (JPEG, MPEG) use 8x8 DCT blocks
What is the DCT? Note: in these equations, p stands for p. • One-dimensional 8 point DCT Input x0, ... x7, output y0, ... y7 • One-dimensional inverse DCT Input y0, ... y7, output x0, ... x7 • Matrix form of equations: x, y are one column matrices
Two-Dimensional DCT • Forward 2DDCT. Input xij i = 0, ... 7, j = 0, ... 7. Output ykl k = 0, ... 7, l = 0, ... 7 • Matrix form, X, Y ~ 8x8 matrices with coefficients xij , ykl • The 2DDCT is separable! Note: in these equations, p stands for p.
General DCT • One dimension • Two dimensions