250 likes | 420 Views
Wavelet-based Region-of-Interest (RoI) Video Coding. Derek Pang, Huizhong Chen, Sherif Halawa EE398A Course Project Winter 2010 Stanford University. Outline. Existing work in IRoI video application Overview of our proposed solution Two coding schemes : 2D+T and T+2D
E N D
Wavelet-based Region-of-Interest (RoI) Video Coding Derek Pang, Huizhong Chen, SherifHalawa EE398A Course Project Winter 2010 Stanford University
Outline • Existing work in IRoI video application • Overview of our proposed solution • Two coding schemes : 2D+T and T+2D • RoI Cropping and DWT filter selection • Modified entropy coders • Modified Zero-tree • Modified EBCOT • Experimental result • Future work/ conclusion
Related Work • “Optimal Slice Size for streaming regions of high resolution video with virtual pan/tilt/zoom functionality,” [Mavlankar et al, 2007] • High bitrate • “Compression-Aware Digital Pan/Tilt/Zoom,” [Makar et al, 2008] • High complexity • Low user scalability • ClassX (formerly viewXtreme) Project. [ Agrawal et al, 2008] • High complexity
Proposed Solution Wavelet-based RoI video coding • Spatial resolution scalability • Fast and direct RoI cropping support in compressed domain • Low to moderate complexity • Scalable to large number of users • R-D performance advantage over current ClassX system on RoI video transmission
Wavelet-Based RoI Video Coding Two coding schemes • T+2D • 2D+T RoI Video Temporal Prediction Coder Residual DWT RoI Video DWT Coder In-Band Prediction Residual
T+2D GOP Structure Background Wavelet-based residual coding A A P P P … H.264 Thumbnail T T T T T Prediction with motion vectors Prediction without motion vectors
T+2D System Overview Input Video Stored Bit Stream Transform/ Quantization/ Entropy Coder Thumbnail Store Predictor Residual Anchor Video Storage Background Frame Buffer Prediction Entropy Decoder/ Inverse Quantization/ Inverse Transform/ RoI Request Entropy Decoder/ Inverse Quantization/ Inverse Transform/ RoI Bit Stream Assembler Decoded RoI Video RoI Bit Stream
2D+T GOP Structure Background Wavelet-based residual coding A A P P P … H.264 LL Band T T T T T In-Band Prediction with motion vectors In-Band Prediction without motion vectors
RoI Extraction in Wavelet-domain • RoI Mask is generated for RoI extraction • Mask size depends on the length of the synthesis filters and the number of DWT levels. Mask Generation RoI Spatial Domain Wavelet Domain
DWT Filter Selection • Mathematical analysis (refer to report) • Experimental analysis Filter Selected: Haar
Entropy Coder - Modified EBCOT • Uniform quantizer • MQ-Coder • Bitplane coder with normal 3 coding passes (Signif., MR, clean-up) Four Major Modifications: • A post-compression RoI extraction • Variable block-size coding • On-demand block size selection • Fast zero-block coding
Modified Zero-tree • Exploits redundancy between decomposition levels by clustering quantized coefficients into “trees”. • Zero-trees (ZTs): tree of zeros. • ZTs are abundant in wavelet-decomposed residuals. • ZTs encode a large number of zeros using very few bits. Pyramid view of Quantized coefficients Tree of base-length 4 (mode-4) Division into 4 mode-2 trees
Joint Entropy-Quantizer RD Optimization Input coefficient tree Quantizer 1 Huffman codebook sel. Lagrangian Cost Quantizer 2 Huffman codebook sel. Lagrangian Cost Decision Quantizer N Huffman codebook sel. Lagrangian Cost
M-ZT – RD Optimized Mode Selection Input coeff. tree (mode-32) R-D opt. Q/CB sel. Mode selection Division to mode-16 R-D opt. Q/CB sel. Mode selection Division to mode-8 R-D opt. Q/CB sel. R-D opt. Q/CB sel. Mode selection Division to mode-8 R-D opt. Q/CB sel. R-D opt. Q/CB sel. Mode selection Division to mode-8 R-D opt. Q/CB sel. R-D opt. Q/CB sel. Mode selection Division to mode-8 R-D opt. Q/CB sel.
Experiment Setup • 512x512 Osgood Sequence • 90 frames • Reference benchmark • ClassX system • One resolution layer • Four 256 x 256 slices • 256 x 256 RoI • Our system • 3 decomposition levels • M-EBCOT block sizes: 4 to 64 • M-ZT block sizes : 8 to 32 Osgood Sequence
Visual Quality Comparsion Low Quality (T+2D EBCOT) 262 kbps 33.27 dB High Quality (T+2D EBCOT) 583 kbps 40.02 dB
Conclusion • Proposed a novel RoI video coding scheme: • based on wavelet coding • fast RoI cropping in compressed domain • improved entropy coders for RoI applications • achieved moderate performance gain • Possible future work • Detailed analysis and testing • Modified zero-tree bit plane coding • Complexity evaluation
Acknowledgement • Aditya Mavlankar • Mina Makar • Piyush Agrawal • Ngai-Man Cheung • Professor Bernd Girod
Questions? Thank you.
2D+T System Overview Highpass Bands Residual Entropy Encoder Transform Quantization Input Video Lowpass Band Anchor Back-ground Frame Buffer H.264 Entropy Decoder Stored Bit Stream Video Storage Entropy Decoder/ Inverse Quantization RoI Bit Stream Assembler RoI Bit Stream Inverse Transform RoI Request Decoded RoI Video