A Comparative Study of Depth Map Coding Schemes for 3D Video

A Comparative Study of Depth Map Coding Schemes for 3D Video Harsh Nayyar, Nirabh Regmi, Audrey Wei March 10th, 2011 EE 398A: Image and Video Compression Professor Girod

Overview • Background & Motivation • Research Methodology • Results & Performance Comparisons • Block Transforms (DCT, KLT) • Block Truncation Coding (BTC) • Conclusion • Questions

Background & Motivation • 3D Compression • Issue: Bit rate scales linearly with number of views • Proposed solution: Code 2-3 views along with depth maps to synthesize intermediate views [Wiegand et al.] • Requires good depth maps • Depth Maps • Desirable to preserve edges • Not typical images

Research Methodology • Block Transform Coding • DCT and KLT • Block Truncation Coding • Constant and adaptive block sizes • Distortion calculated based on synthesized view from uncompressed depth maps

System Overview Left Image (Compressed) Left Depth Map View Synthesis Intermediate Image Right Image (Compressed) Right Depth Map

Evaluation Methodology • Test Sequences: Balloons & Kendo • Depth Maps: Cameras 1 & 3 • Synthesized Views: Camera 2 Acknowledgement: Tanimoto Lab, Nagoya University

Discrete Cosine Transform (DCT) • Block Matrix Sizes: M = 8, 16 • Uniform Quantizer • Step Sizes: 21 - 28 • Entropy Coding • Type used: DCT-II

Discrete Cosine Transform (cont.) Quantizer step size = 21 Quantizer step size = 28

Discrete Cosine Transform (cont.) balloons error, M = 8, Q = 128

Karhunen-Loeve Transform (KLT) • Block Matrix Sizes: M = 8, 16 • Uniform Quantizer • Step Sizes = 21 - 28 • Entropy Coding • Training Set: composed from both views M m x n x p x M M

Karhunen-Loeve Transform (cont.) Quantizer step size = 21 Quantizer step size = 28

Karhunen-Loeve Transform (cont.) balloons error, M = 8, Q = 128

Block Truncation Coding (BTC) • Good at preserving edges • Quantized values per block: a & b • Block Matrix Sizes: M = 2, 4, 8, 16, 32, 64 • Entropy Coding if , output = a if , output = b for i = 1, 2, … , M2 where q = # of Xi’s >

Block Truncation Coding (cont.) M = 8 M = 4 ~1.1dB

Block Truncation Coding (cont.) balloons error, M = 64

Adaptive BTC • Spend bits where necessary • Large blocks handle background (low rate) • Small blocks handle edges (high rate) • Make block size selection based on Lagrangian cost function

Adaptive BTC (cont.) • Lagrangian cost function, • Joint cost of both depth maps • Distortion (D) processed from synthesized view • , = 20 – 28 • Bit rate (R) calculation • 6 Block sizes (M=2-64): 3 bits • Quantized values, a & b: Entropy coding • Positions of a & b in the block: Run Length Coding & Entropy coding

Adaptive BTC (cont.) as Mmax increases

Final Results

Final Results (cont.) Balloons error (frame 1) Scheme: DCT (M = 8, Q = 64) PSNR = 37.65 dB Rate = 0.07465 bpp

Final Results (cont.) Balloons error (frame 1) Scheme: Fixed BTC (M=32) PSNR = 38.6070 dB Rate = 0.0703 bpp

Final Results (cont.) Balloons error (frame 1) Scheme: A-BTC (Mmax=64,Q=32) PSNR = 41.4849 dB Rate = 0.0622 bpp

Final Results (cont.)

Conclusion • Depth Maps • Not ordinary images • Important to preserve edges • Adaptive BTC technique can optimally trade off rate and synthesized distortion • Fixed BTC outperforms DCT, KLT without side information about synthesized distortion • Adaptive BTC outperforms DCT, KLT, Fixed BTC

Future Work • Adaptive BTC • Joint Lagrangian cost based on all possible ways of breaking down blocks in pair of views • Our implementation is sub-optimal • Investigate heuristics to perform block sub-division top-down rather than bottom-up • Preserve higher moments in BTC • Only preserved 2nd moment • Larger block sizes • Only used up to Mmax = 64

References • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Compiti., vol. C-23, pp. 90-93, 1974. • Balloons & Kendo Sequences, Nagoya University Tanimoto Laboratory , http://www.tanimoto.nuee.nagoya-u.ac.jp/. • E. Delp and O. Mitchell, “Image Compression Using Block Truncation Coding,” Communications, IEEE Transactions on., vol. 27, no. 9, pp. 1335-1342, Sep. 1979. • Z. Li and M. Drew, ”Karhunen-Loeve Transform,” in Fundamentals of Multimedia. Upper Saddle River. Pearson Education, 2004, ch. 8, sec. 5.2. pp. 220-222. • P. Merkle, Y. Morvan, A. Smolic, D. Farin, K. Muller, P. H. N. de With, and T. Wiegand, “The effects of multiview depth video compression on multiview rendering,” Signal Process., Image Commun., vol. 24, no. 1+2, pp. 7388, Jan. 2009. • K. Mller, P. Merkle, and T. Wiegand, “3-D video representation using depth maps,” Proceedings of the IEEE, vol. PP, no. 99, pp. 1-14, 2010.

A Comparative Study of Depth Map Coding Schemes for 3D Video