290 likes | 498 Views
A Comparative Study of Depth Map Coding Schemes for 3D Video. Harsh Nayyar, Nirabh Regmi, Audrey Wei March 10 th , 2011 EE 398A: Image and Video Compression Professor Girod. Overview. Background & Motivation Research Methodology Results & Performance Comparisons
E N D
A Comparative Study of Depth Map Coding Schemes for 3D Video Harsh Nayyar, Nirabh Regmi, Audrey Wei March 10th, 2011 EE 398A: Image and Video Compression Professor Girod
Overview • Background & Motivation • Research Methodology • Results & Performance Comparisons • Block Transforms (DCT, KLT) • Block Truncation Coding (BTC) • Conclusion • Questions
Background & Motivation • 3D Compression • Issue: Bit rate scales linearly with number of views • Proposed solution: Code 2-3 views along with depth maps to synthesize intermediate views [Wiegand et al.] • Requires good depth maps • Depth Maps • Desirable to preserve edges • Not typical images
Research Methodology • Block Transform Coding • DCT and KLT • Block Truncation Coding • Constant and adaptive block sizes • Distortion calculated based on synthesized view from uncompressed depth maps
System Overview Left Image (Compressed) Left Depth Map View Synthesis Intermediate Image Right Image (Compressed) Right Depth Map
Evaluation Methodology • Test Sequences: Balloons & Kendo • Depth Maps: Cameras 1 & 3 • Synthesized Views: Camera 2 Acknowledgement: Tanimoto Lab, Nagoya University
Discrete Cosine Transform (DCT) • Block Matrix Sizes: M = 8, 16 • Uniform Quantizer • Step Sizes: 21 - 28 • Entropy Coding • Type used: DCT-II
Discrete Cosine Transform (cont.) Quantizer step size = 21 Quantizer step size = 28
Discrete Cosine Transform (cont.) balloons error, M = 8, Q = 128
Karhunen-Loeve Transform (KLT) • Block Matrix Sizes: M = 8, 16 • Uniform Quantizer • Step Sizes = 21 - 28 • Entropy Coding • Training Set: composed from both views M m x n x p x M M
Karhunen-Loeve Transform (cont.) Quantizer step size = 21 Quantizer step size = 28
Karhunen-Loeve Transform (cont.) balloons error, M = 8, Q = 128
Block Truncation Coding (BTC) • Good at preserving edges • Quantized values per block: a & b • Block Matrix Sizes: M = 2, 4, 8, 16, 32, 64 • Entropy Coding if , output = a if , output = b for i = 1, 2, … , M2 where q = # of Xi’s >
Block Truncation Coding (cont.) M = 8 M = 4 ~1.1dB
Block Truncation Coding (cont.) balloons error, M = 64
Block Truncation Coding (cont.) balloons error, M = 16
Block Truncation Coding (cont.) balloons error, M = 2
Adaptive BTC • Spend bits where necessary • Large blocks handle background (low rate) • Small blocks handle edges (high rate) • Make block size selection based on Lagrangian cost function
Adaptive BTC (cont.) • Lagrangian cost function, • Joint cost of both depth maps • Distortion (D) processed from synthesized view • , = 20 – 28 • Bit rate (R) calculation • 6 Block sizes (M=2-64): 3 bits • Quantized values, a & b: Entropy coding • Positions of a & b in the block: Run Length Coding & Entropy coding
Adaptive BTC (cont.) as Mmax increases
Final Results (cont.) Balloons error (frame 1) Scheme: DCT (M = 8, Q = 64) PSNR = 37.65 dB Rate = 0.07465 bpp
Final Results (cont.) Balloons error (frame 1) Scheme: Fixed BTC (M=32) PSNR = 38.6070 dB Rate = 0.0703 bpp
Final Results (cont.) Balloons error (frame 1) Scheme: A-BTC (Mmax=64,Q=32) PSNR = 41.4849 dB Rate = 0.0622 bpp
Conclusion • Depth Maps • Not ordinary images • Important to preserve edges • Adaptive BTC technique can optimally trade off rate and synthesized distortion • Fixed BTC outperforms DCT, KLT without side information about synthesized distortion • Adaptive BTC outperforms DCT, KLT, Fixed BTC
Future Work • Adaptive BTC • Joint Lagrangian cost based on all possible ways of breaking down blocks in pair of views • Our implementation is sub-optimal • Investigate heuristics to perform block sub-division top-down rather than bottom-up • Preserve higher moments in BTC • Only preserved 2nd moment • Larger block sizes • Only used up to Mmax = 64
References • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Compiti., vol. C-23, pp. 90-93, 1974. • Balloons & Kendo Sequences, Nagoya University Tanimoto Laboratory , http://www.tanimoto.nuee.nagoya-u.ac.jp/. • E. Delp and O. Mitchell, “Image Compression Using Block Truncation Coding,” Communications, IEEE Transactions on., vol. 27, no. 9, pp. 1335-1342, Sep. 1979. • Z. Li and M. Drew, ”Karhunen-Loeve Transform,” in Fundamentals of Multimedia. Upper Saddle River. Pearson Education, 2004, ch. 8, sec. 5.2. pp. 220-222. • P. Merkle, Y. Morvan, A. Smolic, D. Farin, K. Muller, P. H. N. de With, and T. Wiegand, “The effects of multiview depth video compression on multiview rendering,” Signal Process., Image Commun., vol. 24, no. 1+2, pp. 7388, Jan. 2009. • K. Mller, P. Merkle, and T. Wiegand, “3-D video representation using depth maps,” Proceedings of the IEEE, vol. PP, no. 99, pp. 1-14, 2010.