700 likes | 711 Views
Adaptive Rate-Distortion Based Wyner-Ziv Video Coding. Lina Karam Image, Video, and Usability (IVU) Lab Department of Electrical Engineering Arizona State University Tempe, AZ 85287 karam@asu.edu ivulab.asu.edu. Outline. Motivation Existing DVC Approaches
E N D
Adaptive Rate-Distortion Based Wyner-Ziv Video Coding Lina Karam Image, Video, and Usability (IVU) Lab Department of Electrical Engineering Arizona State University Tempe, AZ 85287 karam@asu.edu ivulab.asu.edu
Outline Motivation Existing DVC Approaches BLAST-DVC: Rate-distortion based BitpLane SelecTive decoding for pixel-domain Distributed Video Coding AQT-DVC: Rate-distortion based Adaptive QuanTization for transform-domain Distributed Video Coding Enhanced AQT-DVC Conclusion and future directions
Motivation Mother and Daughter CIF – 352 x 288 Spatial and Temporal Redundancy Frame 60 Frame 61 Time
Motion Estimation and Compensation CIF Mother & Daughter Reference Frame(Frame 197) Current Frame(Frame 198)
Residual Error ( No Motion Compensation) Difference (Residual) Frame = Frame 198 – Frame 197
Motion Estimation and Compensation CIF Mother & Daughter Reference Frame(Frame 197) Current Frame(Frame 198) = Reference Frame + Error
Full Search Motion Estimation [8x8] block motion vectors superimposed on Reference Frame (Frame 197)
Motion Compensation Motion Compensated Reference (Frame 197) PSNR = 40.8 dB, MSE = 5.4
Residual Error ( 16x16 blocks, Full pixel) PSNR = 39.4 dB, MSE = 7.5
Residual Error ( 4x4 blocks, quarter pixel) PSNR = 45 dB, MSE = 2.1
Variable block size (16x16 – 4x4) + quarter-pel + multi-frame motion compensation+ R-D Optimization ( H.264 2004) 85%
(H.264 Decoder) From: T.-A. Liu, T.-M. Lin, S. -Z. Wang, et al. “A low-power dual-mode video decoder for mobile applications,” IEEE Communications Magazine, volume 44, issue 8, pp.119-126, Aug. 2006. Encoder performsboth Motion Estimation and Compensation Motion Estimation operationmuch more computationallycomplex and consumes much more power than Motion Compensation
Distributed Video Coding: Motivation Conventional video coding • MPEGx or H.26x • High complexity video encoder due to motion estimation. Emerging applications • Video compression with mobile devices • Low complexity video encoder is preferred to reduce the hardware cost and to extend battery life. • Video compression for sensor networks • Low complexity video encoder is also preferred to reduce the hardware cost and to extend battery life. • Inter–sensor communication may not be allowed or needs to be minimized. Two main frameworks • Multi-View/Multi-Cameras • Single-View/Single Camera (Wyner-Ziv Video Coding)
Distributed Video Coding: Objectives • Intraframe encoding and interframe decoding • Move complexity (motion estimation) from encoder to decoder • Achieve interframe compression rate-distortion performance • Distributed source coding • Compress consecutive frames separately • Decode the frames jointly at the decoder • Motivated by the work of Slepian-Wolf (1973) and Wyner-Ziv (1976) • Slepian-Wolf : possible to compress losslessly two statistically dependent sources in a distributed fashion at a rate equal to their joint entropy • Wyner-Ziv: possible to compress in a distributed fashion and achieve the same rate-distortion performance as when coding in a non-distributed fashion (Gaussian memoryless sources and mean-square error distortion).
Distributed Video Coding (DVC): How? Back to Mother & Daughter… Current Frame(Frame 198) = Reference Frame + “Error” Reference Frame(Frame 197) DVC problem becomes: Correct or Reduce “Error” without using Motion Estimation at the encoder and without knowing what the “Error” is! Similar to a channel coding problem => can make use of channel codes
Distributed Video Coding (DVC): Example QCIF (176x144) Foreman • Encoder: Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Parity Bits or Syndrome bits Parity Bits or Syndrome bits Intra-coded Intra-coded Intra-coded • Decoder: • - Recovers even frames from intra-coded odd-numbered frames • - Odd-numbered frames are considered to be a distorted version of • even-numbered frames; i.e. Frame2n=Frame2n-1+”Error” • - “Error” corrected using parity bits or syndrome bits
Distributed Video Coding (DVC): Example • Issue 1: “Error” can be large => need to send a lot of parity bits • => large bitrate Frame 56 Frame 55 • Strategy: at the decoder, try to reconstruct even frames using • received odd frames (e.g., bi-directional motion-compensated • interpolation).
Distributed Video Coding (DVC): Example QCIF (176x144) Foreman • Decoder: Side Information Generation Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 interpolate interpolate Interpolated frames called “side information” • Issue 2: How to generate high-quality side information? • Issue 3: How do we determine the number of needed parity or • syndrome bits ? • - Sending too much will waste bits • - Sending too little might leave large distortions uncorrected
Existing Approaches PRISM (Puri et al., IEEE Trans. IP, Oct 2007) Syndrome-based Wyner-Ziv Coding by dividing codeword space into cosets After quantization, bitplane representation used Most significant bits can be inferred from side information Least significant bits (syndrome bits) need to be encoded and sent to decoder Issues: - Syndrome coding rate is fixed in advance • Coding can stop if CRC check fails => correctness not guaranteed • Coding performance decreases significantly if unknown source statistics. Source correlation not known in advance in practice and is hard to estimate 1 0 1 1 0 0 1 1
Existing Approaches Feedback-channel-based DVC by Aaron et al. 2004, Girod et al., 2005 • Bitplane coding • Rate-Compatible Punctured Turbo (RCPT) codes used to generate parity bits (Slepian-Wolf coding) for each bitplane • Feedback channel used to request parity bits based on need • No need to determine number of parity bits to send in advance • Hybrid FEC/ARQ–like scheme • Feedback channel is to acknowledge the decoding correctness (e.g., CRC can be used to check correctness) • Bitrate is determined on the fly. • Decoding successes can be guaranteed.
Existing Approaches: Feedback-based DVC (Girod’s Group) For pixel-domain, no DCT, IDCT RCPT MCTI
Existing Approaches: DISCOVER (Artigas et al., PCS 2007 ) Significant R-D performance improvement LDPCA* Hierarchical subpixel ME with Smoothing filter * LDPCA provided by Girod’s Group – Varodayan et al., 2006
Issues with Existing Approaches • Issue 1: Existing DVC schemes do not adapt the Slepian-Wolf decoding to the local characteristics of the video => every bitplane is Slepian-Wolf decoded based on bit budget starting from MSB to LSB. • - Decoding stops when no error detected or when bit budget exhausted. Some important locations and bitplanes might not • be decoded! • Question: • Can we skip some less important regions and bitplanes without decoding them? • How do we measure the significance of a bitplane?
Issues with Existing Approaches • Issue 2: Existing DVC schemes do not adapt the quantization to the local characteristics of the video => During the encoding, a single quantizer matrix (one fixed quantizer for each subband) is selected for the whole video. • Question: • Can we adapt the quantization matrix to the local characteristics of the video so as minimize the needed bits for LDPCA-decoding while maximizing the quality?
Proposed Strategy • Divide each video frame into partitions in order to exploit local characteristics • Allocate bits to a partition only if they result in sufficient distortion reduction • - Determined using Distortion-Rate (D-R) ratios: D-R = D/R, where D = Distortion Reduction resulting from allocating R bits. • Mimimum allowed distortion reduction per bit is specified in terms of a target Distortion-Rate (D-R) ratio = TD-R • -Allocate bits only if D/R of partition is > TD-R • D/R is an indication of how much distortion reduction (quality) can a bit can buy us on average for the considered partition • Bits can be allocated to a partition via Slepian-Wolf (LDPCA-) decoding and/or by selecting quantization matrix • Target TD-R used to control bit-rate: set low for high bit-rate coding, and high for low bit-rate coding
Challenge: How to Measure Distortion-Rate Ratio? The original source information is not available at the decoder, so the distortion D cannot be exactly measured. The bitrate R cannot be known without decoding. Proposed Approach: Distortion-Rate Ratio estimation performed at the decoder using the side information frames and the source correlation model • The complexity of the encoder is not increased • More flexibility as the decoder can selectively decode the bitplanes based on a target distortion-rate ratios. The target rate-distortion ratio can be changed so that different R-D operating point can be achieved. • Error probability needs to be estimated at decoder
BLAST-DVC: Distortion-Rate Ratio Estimation Source Correlation Model Let D be the difference of the source information X and its side information Xside. D can be modeled as a random variable with a Laplacian distribution. αcan be estimated from the co-located blocks of two motion-compensated Key frames and (Brites et al., 2006). where m = partition index and n is the pixel location in the partition
BLAST-DVC: Rate Estimation : • Let be the error probability at a pixel n in bitplane k in partition • Average of the error probabilities over subimage : • The needed bits for the considered kth bitplane can be computed as:
BLAST-DVC: Error Probability Estimation The probability of bit error can be expressed as: where bn,kand b’n,k denote a bit in the kth bitplane corresponding to the nth pixels in the original subimage and in the side information (generated through motion compensated interpolation), respectively. DBP stands for Decoded Bit Planes.
BLAST-DVC: Distortion Estimation Estimate distortion reduction if the target bitplane is decoded. Average distortion estimation for a sub-image Xn Average distortion if the target bitplane is LDPCA decoded Distortion reduction Average distortion if the target bitplane is not LDPCA decoded Partially reconstructed pixel value based on the previously determined k-1 bitplanes and side info Partially reconstructed pixel value when the target bitplane is LDPCA-decoded => minimum distance symbol reconstruction is used
Minimum Distortion Reconstruction Side Info Original Laplacian RV
Distortion Estimation – Bitplane not decoded 0 1 y-Xside y-Xside 0 00 01 255 Consider that the MSB is 0 and we want to determine next bit Estimated value y y => Next bit is 1
Distortion Estimation – Bitplane LDPCA-decoded Consider that the MSB is 0 and we want to determine next bit 00 01 10 11 y-Recon(y,Xside) y-Recon(y,Xside) 0 255 If y in Bin 01, Xside is Recon(y,Xside) If y in Bin 00, Recon(y,Xside) y y
Bitplane Decoding Selection Once the rate Rk and the distortion reduction ΔDkare obtained, a targeted distortion-rate ratio t can be chosen to determine whether bitplane decoding should be performed. If ΔDk / Rk < t , the current bitplane is not decoded (NDBP case) If ΔDk / Rk ≥ t , CRC bits are requested followed progressively by parity/syndrome bits, one parity/syndrome bit at a time, so that error correction can be applied to the current sub-image bitplane by means of LDPCA until no errors are detected (DBP case).
Simulation Setup QCIF Video Sequences (176x144) Frame rate: 15 frame per second. Number of partitions per frame = 64 (22x18 each) Comparison with following systems: • H.264 Inter : I-B-I-B • H.264 Intra only • DISCOVER by X. Artigas et al. • Transform domain DVC, GOP = 2. • PDDVC (non-adaptive best pixel-domain system) • Pixel domain DVC, GOP =2. • Special case of the proposed system but no partitions (1 partition per frame)
Simulation Results 22% reduction 18% reduction 1.6 dB 2.0 dB
Simulation Results 18% reduction 1.4 dB 18% reduction 0.8 dB
Visual Testing Setup 9 subjects took the test. Two video sequences are randomly placed side by side on a 19” Dell Ultrasharp screen. Score • 1: DISCOVER is much better than BLAST DVC • 2: DISCOVER is better than BLAST DVC • 3: same quality • 4: DISCOVER is worse than BLAST DVC • 5: DISCOVER is much worse than BLAST DVC
Visual testing Hall Monitor Foreman
Proposed System Frame bits: 3.36 kbits. Frame PSNR: 32.89 dB. DISCOVER Frame bits: 5.34 kbits. Frame PSNR : 33.21 dB Sequence average bitrate is 140.38 kbps and average PSNR is 34.31 dB for DISCOVER. Sequence average bitrate is 121.57 kbps and average PSNR is 34.29 dB for the proposed system.
Proposed System Frame bits: 5.83 kbits Frame PSNR: 31.84 dB DISCOVER Frame bits: 8.61 kbits Frame PSNR: 33.16 dB Sequence average bitrate is 167.01 kbps, and average PSNR is 32.38 dB for DISCOVER. Sequence average bitrate is 166.48 kbps, and average PSNR is 31.68 dB for the proposed system.
DISCOVER BLAST-DVC Compressed at 15fps, 167.01 kbps Compressed at 15fps, 166.48 kbps
AQT-DVC: Transform-Domain Distributed Video Coding with Rate-Distortion Based Adaptive Quantization Motivation • Transform domain DVC performance is better than pixel domain DVC performance, especially for high motion sequences. • Rate-distortion based adaptive quantization provides a better quantization scheme in terms of rate-distortion performance. Considerations: • Feedback channel Minimize the traffic on the feedback channel. Bitplane selective scheme is not applicable because the number of bitplanes might be too large. -> One quantization matrix for each partition (M 4x4 DCT blocks) • Partition size versus LDPCA block size Smaller partition size keeps the flexibility of the quantization scheme. Larger LDPCA block size provides a better error correction ability and reduce the feedback channel traffic. -> One LDPCA code for a bitplane of a subband. -> Due to different adativequantizers, resulting bitplanes are not rectangular (irregular shape) and have undefined values => need to modify LDPCA