Xin Li LDCSEE, WVU Email: xin.li@ieee

New Direction in Wyner-Ziv Video Coding:On the Importance of Modeling Virtual Correlation Channel (VCC) Xin Li LDCSEE, WVU Email: xin.li@ieee.org “If you can’t solve a problem, then there is an easier problem you can solve: find it.” - George Pólya

Formulation of a Simpler Problem x2t-1 x2t-1 x2t-2 x2t x2t-2 x2t I frames key frames B frames WZ frames Conventional video coding (source coding) Wyner-Ziv video coding (joint source-channel coding) Assuming I or key frames are coded by the same intra-frame encoder, can we achieve comparable coding efficiency on WZ frames to H.264 (state-of- the-art techniques of coding B frames)?

Outline of Our Attack • Motivating observations • Characterizing the nonstationary virtual correlation channel (VCC) by a mixture model • Theoretical derivation • Classification gain (dual to that in conventional source coding) • Classification-based DVC algorithm • Approximate solution to the simplified problem • Experimental results • Comparable R-D performance to H.264 JM11.0 (for certain type of video sequences: slow motion) • Discussions and perspectives • Dualities between conventional and distributed video coding • DVC=video modeling + DSC (Rate) + Estimation (Distortion)

Motivations • Learn from the conventional wisdom: What is the major factor contributing to the success of existing image/video coding standards such as JPEG2000 and H.264? • It is the source classification principle and its subtle implications rooted in the earlier pioneering works such as EZW/SPIHT and multi-hypothesis MCP • Therefore, by following the duality, it is natural to consider the idea of classifying the virtual correlation channel in distributed source coding • Unlike conventional video coding, motion estimation (ME) is done at the decoder instead of encoder side in WZ video coding (we have addressed this issue separately under a different context1) 1X. Li, “Video processing via implicit and mixture motion model,” IEEE Trans. on Cir. Sys. for Video Tech., vol. 17, no. 8, pp. 953-963, Aug. 2007.

Modeling Non-stationary VCC Why is the virtual correlation channel is non-stationary? Misaligned edges, deformable motion, illumination variations are all spatio-temporally varying phenomena Mixture modeling of virtual correlation channel (e.g., significant vs. insignificant temporal interpolation errors) WT of Interpolated WZ frames (side information) additive errors WT of original WZ frames (e.g., significant vs. insignificant wavelet coefficients)

Summary of Theoretical Results Rate-Distortion optimization problem formulation s.t. R-D function Rate allocation Classification gain Conventional source coding Distributed source coding

Implications into WZ Video Coding In conventional source coding, classification gain implies that subsource of larger variance be assigned a higher priority in rate allocation In distributed source coding, similar conclusion can be made except that the variance of “subsource” is now determined by the virtual correlation channel OR

Conclusion: the class of significant coefficients that are poorly motion compensated have the largest R-D slope (they should be coded first: where are they? and what are they?)

Rate Control Dilemma • How can we estimate the second-order statistics of VCC: z2 (the accuracy of side information yt generated by temporal interpolation)? • At the encoder, we have access to xt (original WZ frames) but not yt (side information)1 • At the decoder, we have access to yt (side information) but not xt (original WZ frames)2 • We have adopted decoder-based approach based on a feedback channel and scale invariance assumption about zt (an approximate but tractable solution) 1Berkeley’s PRISM scheme allows simple temporal dependency estimation at the encoder. 2Stanford’s researchers suggested the use of feedback channel for rate control.

Feedback via Scale Invariance of Interpolation Errors x2t-1 key frames hall foreman WZ frames oracle actual x2t-2 x2t+2 x2t Block-based significance map of zt Fine-resolution: Coarse-resolution: Key frame oracle S.I. Interpolated Key frame

Classification-based WZ Video Coding System Encoder Decoder WZ frames decoded WZ frames SW lossless coding of significance map block-based classification wavelet transform CI Joint Exploitation SI SW lossless coding of significance coeff. advanced temporal interpolation decoded I frames In a nutshell, we only allocate bits to the class of poorly motion compensated significant coefficients: both x2 and z2 are large WT feedback channel

Joint Exploitation of Side and Coded Information at the Decoder Target of estimation: E[x|y,Q(x)] Latent variable: z (we don’t know z2) z~N(0,z2) CI=Q(x) y Update estimate of x Update estimate of z2 x SI initial guess

Justification of Distortion Reduction SI alone SI+CI foreman-qcif, block size 1616, 18.3% blocks are coded

Coding Experiments Setup • Parameter setting • Block size: 1616, WT: Daubechies’ 9-7, Slepian-Wolf lossless encoder: LDPC-based1, uniform quantizer (∆=8) • Rate control: thx, thz - significance thresholds for x and z respectively • SI generation: Implicit MC vs. Explicit MC • Benchmark: H.264 JM11.0 implementation (QP of I frames is small and fixed ) 1Liveris, A.D.; Zixiang Xiong; Georghiades, C.N., "Compression of binary sources with side information at the decoder using LDPC codes," IEEECommunications Letters, vol.6, no.10, pp. 440-442, Oct 2002

Comparison of Temporal Interpolation Implicit MC1 Explicit MC2 Foreman-qcif, ad-hoc fusion by simple averaging 1X. Li, “Video processing via implicit and mixture motion model,” IEEE Trans. on Cir. Sys. for Video Tech., vol. 17, no. 8, pp. 953-963, Aug. 2007. 2Tourapis, A.M.; Hye-Yeon Cheong; Liou, M.L.; Au, O.C., "Temporal interpolation of video sequences using zonal based algorithms," Proc. of ICIP, pp.895-898 vol.3, 2001

R-D Performance Comparison (I) Hall-qcif, 30frames Foreman-qcif, 30frames

R-D Performance Comparison (II) Container-qcif, 30frames Football-qcif, 30frames

Dualities between Conventional and WZ Video Coding • Exploitation of motion-related temporal dependency • In traditional video coding, prediction is based on original frames (overhead is involved) • In WZ video coding, interpolation is based on reconstructed key frames (no overhead) • Importance of SI generation1 • R-D optimization shifted from encoder to decoder • In traditional video coding, decoder is often fixed but encoder enjoys considerable flexibility • In WZ video coding, rate control through the feedback channel offers great flexibility to the decoder without touching encoder2 • Importance of matching SW lossless encoder with the statistics of virtual correlation channel (UEP is desirable) 1L. Lu, D. He, A. Jagmohan, “Robust Multi-Frame Side Information Generation For Distributed Video Coding”, Proc. Of ICIP’2007 2Girod, B.; Aaron, A.M.; Rane, S.; Rebollo-Monedero, D., "Distributed Video Coding,“ Proceedings of the IEEE , vol.93, no.1, pp.71-83, Jan. 2005

Acknowledgement • Ligang Lu and Dake He for inviting me to participate this special session • Zixiang Xiong for sharing with me his students’ implementation of LDPC-based Slepian-Wolf coding algorithm • E. Simoncelli for stimulating discussions on distributed motion representations

Xin Li LDCSEE, WVU Email: xin.li@ieee