520 likes | 1.37k Views
Transform Coding. Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated 2013. 9. 30. Agenda . Transform Coding Concept Transform Theory Review DCT (Discrete Cosine Transform) DCT in Video coding DCT Implementation & Fast Algorithms
E N D
Transform Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated 2013. 9. 30
Agenda • Transform Coding Concept • Transform Theory Review • DCT (Discrete Cosine Transform) • DCT in Video coding • DCT Implementation & Fast Algorithms • Appendix: KL Transform
1. Transform Coding • X1= lum(2n), X2= lum(2n+1), neighbor pixels • X1 ~ U(0, 255), X2~ U(0,255) • Quantization of X1 and X2 => same data • Cross-Correlation of X1and X2 • Y1, Y2 • 45 degree rotation • Y1 = (X1 + X2) /2 • Average or DC value • Y2 = (X2– X1) /2 • Difference or AC value • Y1 ~ F(0, 255), Y2~ F(-255,255) 0 0 -255 255 255
Which onesare easier to encode (quantize)? f(X1) f(X2) 0 0 255 255 f(Y1) f(Y2) 0 0 -255 255 255
Origins of Transform Coding Benefits • Signal Theory • Make the representation easier to manipulate • energy concentration • Image and HVS Properties • HVS is more sensitive to Low frequency • More dense quantizer to Low frequency Vilfredo Pareto Economist 1848-1923
2. Transform Theory Review • Definition of Transform • N to Mmapping, [Y1, Y2, . . ., YN] = F [X1,X2, . . ., XM] • Linear Transform (cf. Non-Linear Transform) • if [Y11, Y12] = F [X11,X12] and [Y21, Y22] = F [X21,X22] • [Y11 + Y21, Y12 +Y22] = F [X11+X21, X21+X22] • Matrix representation of Linear Transform • Forward • Inverse y = T x N transform coefficients, arranged as a vector Transform matrix of size NxN Input signal block of size N, arranged as a vector x = T-1 y
Basis Vectors • Orthogonal • Vl * Vm = 0for basis Vector V1, V2, . . ., VN • Each vectors are disjointed, separated. • Orthonormal • || Vl || = 1 for basis Vector V1, V2, . . ., VN • Parseval’s Theorem • Signal Power/Energy conserves between Transform Domain v1 v2 v3 vN x = T-1 y = TT y T-1 =TT => ||y||2 = yTy = xTTT Tx = ||x||2
2D Transform • Data • 2D pixel value matrix, 2D transform coefs matrix • 2D matrix => 1D vector • Forward Transform • Inverse transform y = T x NxN transform coefficients, arranged as a vector Transform matrix of size N2xN2 Input signal block of size NxN, arranged as a vector x = T-1 y
3. Transforms • Various transforms in image compression • DFT (Discrete Fourier Transform) • DCT (Discrete cosine Transform) • DST (Discrete sine Transform) • Hadamard Transfrom • Discrete Wavelet Transform • and more(HAAR etc )
Hadamard transform • Core Matrix • 1차원 • N 차원 • 2차원 • Transform
DCT Transform • 1D Forward DCT (pixel domain to frequency domain) • 1D Inverse DCT (frequency domain to pixel domain)
2D DCT • 2D DCT basis Functions • Coef. Distribution • DC ~ Uniform dist., AC ~ Laplacian dist.
Properties • Orthonormal transform • Separable transform • Real valued coefficients • DCT performance • very resembles KLT for image input • Image input model (1 order Markov chain) • xn+1 = rho * xn+1 + e(n) • DCT complexity • 2D DCT = 1D DCT for vertical * 1D DCT for horizontal • Not for 3D (for delay and memory size) • DCT size (4x4, 8x8, 16x16, 32x32 …) • Larger: better performance, but blocking artifact (?) and HW complexity
Coding Performance of DCT Karhunen Loève transform [1948/1960] Haar transform [1910] Walsh-Hadamard transform [1923] Slant transform [Enomoto, Shibata, 1971] Discrete CosineTransform (DCT) [Ahmet, Natarajan, Rao, 1974] Comparison of 1-d basis functions for block size N=8
Energy concentration Performance • measured for typical natural images, block size 1x32 • KLT is optimum • DCT performs only slightly worse than KLT
N N Complexity Performance of DCT • Separation of 2D DCT • Cascading 1-D DCT • Reduction of the complexity (multiplication) from O(N4) to O(N3) • 8x8 DCT • For 64 each Coefs, 64 multiplications • 2 times 64 Coefs x 8 • Can you derive this ? NxN block of transform coefficients NxN block of pixels column-wise N-transform row-wise N-transform
4. Transform in Image Coding • Transform coding Procedure • Transform T(x) usually invertible • Quantization not invertible, introduces distortion • Combination of encoder and decoder lossless
DCT in Image Coding DCT Q Run-level coding Transformed 8x8 block Original 8x8 block Zig-zag scan Transmission Reconstructed 8x8 block Scaling and inverse DCT Inverse zig-zag scan
DCT in Image Coding • Uniform deadzone quantizer • transform coefficients that fall below a threshold are discarded. • Entrphy coding • Positions of non-zero transform coefficients are transmitted in addition to their amplitude values. • Efficient encoding of the position of non-zero transform coefficients: zig-zag-scan + run-level-coding
DCT Examples • Note that only a few coefficients has sizable value.
DCT coding with increasingly coarse quantization, block size 8x8 quantizer stepsize for AC coefficients: 100 quantizer stepsize for AC coefficients: 25 quantizer stepsize for AC coefficients: 200
4. Implementation • Implementation issue • HW or SW • Computational Cost, Speed, Implementation Size • Performance Cost • Implementation complexity • SW Implementation decision factors • Computational cost of multiplication • Whether Fixed or Float point operation (esp. multiplication) • Special Coprocessor and Instruction set (e.g. MMX)
Fast DCT Algorithm • Original DCT/IDCT • Computation load • 64 Add + 64 Mult. • 8 (7) Addition + 8 multiplication / one coeff. (from eqn.) • Scaling • input range [0, 255] => output range [-2024, 2024] • Fast DCT • Similar to Fast DFT • Share same computation between nodes. • O(NxN) => O (N log2N) • N : Width (num of coeff.) • log2N : Steps of algorithm • Several version : Chen, Lee, Arai etc
Chen’s FDCT See Code at http://www.cmlab.csie.ntu.edu.tw/~chenhsiu/tech/fastdct.cpp
How the fast algorithm works? • Exploiting the symmetry of cosine function. • STEP 1 • STEP 2
HW Implementation • 2D DCT using 1D DCT Function Block Input sample 1-D DCT Output coef MUX 8x8 RAM Row order input Column order output
Distributed Arithmetic DCT • Multiplier-less architecture • Lookup, Shift, accumulators only 4 bits from u input Shift(2-1) LUT (ROM) accumulator Output coef Fx Add or subtract
IQ IQ IDCT Mismatch • DCT x IDCT = I ? • DCT is defined: in “floating point” and “direct form.” • Integer Implementation induces ‘error’ after Inverse DCT. • different FDCT has different ‘error’s. • DCT mismatch in MC-DCT • different reference image at encoder and decoder • very small error but it accumulates. orgE VLD DCT Q VLC IDCTD IDCTE Should Equal but Mismatch ! recD recE
IDCT Mismatch control • Minimum accuracy of DCT algorithm is defined in SPEC. • H.261/3,MPEG-1/2 Restrict the sum of coefficients values • Oddification rule of sum of all DCT coefficients, • Make LSB of F[63], the last Coef. • Decoder check and correct the values • H.264 • (modified) Integer DCT is used adding random error cancelation
Appendix KL Transform, The Optimal Transform
Optimal Transform • Optimality • (No) Redundancy in input signal => (No) Redundant Quantization Result • No cross-correlation between different components (coefs) • K-L (Karhunen-Loeve) transform • Assumption • Input Covariance is given • Problem Definition • find a transform (Y=T X) such that RY,Y = T RX,X TT meets diagonal matrix (i.e., completely uncorrelated Y)
Optimal Transform • Solution • Build T with eigenvectors of RX,X as basis vector • Then, by the definition of Eigen-vectors & values (of RX,X) • So. • Issue in KLT • RX,X is varying for image to image: Need to calculate new T, transmit it to decoder • Not Separable (vertical, horizontal) • But, good for benchmarking performance of other transform.