680 likes | 831 Views
High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing. 適用於雲端架構下兼具高效能與平行化設計之分散式視訊編碼. Cheng, Han-Ping 程瀚平 Advisor: Prof. Wu, Ja -Ling 吳家麟 教授 2010/6/2. Outline. Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder
E N D
High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing 適用於雲端架構下兼具高效能與平行化設計之分散式視訊編碼 Cheng, Han-Ping 程瀚平Advisor: Prof. Wu, Ja-Ling 吳家麟 教授 2010/6/2 CMLab, CSIE, NTU
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU
Trends of Cloud Computing Cloud Computing makes Clients slimmer&thinner CMLab, CSIE, NTU
Video Coding in Cloud Computing • Only need low complexity encoder and decoder at client side • Conventional video coding (e.g. H.264) • Encode once, decode many times • Low complexity decoder • Distributed Video Coding (DVC) • e.g. Video surveillance, wireless sensor network • Low complexity encoder CMLab, CSIE, NTU
RX ≧H(X) Source X X Joint Encoder Joint Decoder Statistical dependency Y Source Y Distributed Video Coding RX + RY≧H(X, Y) Conventional video coding paradigm • RY ≧H(Y) • RX ≧H(X) Source X Encoder X • Slepian&Wolf : H(X, Y) !! X RX + RY≧? Joint Decoder Dependency exists but is not exploited Y Source Y Encoder Y • RY ≧H(X) Slepian-Wolf Theorem (1973) Wyner-Ziv Theorem (1976) CMLab, CSIE, NTU
Distributed Video Coding RX + RY≧? Encoder X Joint Decoder • RX ≧H(X|Y) Channel Encoder Channel Decoder Source Encoder Source Decoder X Quantizer Source X P Virtual channel • Correlation is exploited X’ Side information estimation • Dependency exists but is not exploited Y Encoder Y • RY ≧H(Y) Source Decoder Source Encoder Source Y Quantizer • Wyner&Ziv: H(X, Y)! Y • DVC is also called Wyner-Ziv (WZ) video coding Channel coding (Error Control Code): Channel Encoder X+P Noisy Channel (X+P)’ Channel Decoder X’ X • Wyner-Ziv Theorem (1976) • Extend to lossy coding CMLab, CSIE, NTU
Video Coding in Cloud Computing Cloud Computational Resource H.264 encoded bitstream WZ to H.264 Transcoder WZ encoded bitstream H.264 decoder (Low Complexity) WZ encoder (Low Complexity) WZ to H.264 video transcoder CMLab, CSIE, NTU
Motivation • There is still a gap between Wyner-Ziv video coding and conventional video coding (e.g. H.264/AVC) • Most reported WZ codecs have a high time-delay in the decoder • Trends of parallel computing • e.g. Multi-core CPU, GPU • Parallelizability of the decoder is essential CMLab, CSIE, NTU
DISPAC Video Codec • DIStributed video coding with PArallelized design for Cloud computing (DISPAC) • To better rate-distortion (RD) performance • Combine coding tools developed in recent literatures with some newly developed modules. • To reduce decoding time-delay • Highly parallelized decoder. CMLab, CSIE, NTU
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU
DISPAC Video Codec • Combine coding tools of two state-of-the-art WZ codec: • DISCOVER codec (Distributed coding for video services) • X. Artigas et al., “The DISCOVER codec: architecture, techniques and evaluation”, PCS, 2007 • MLWZ codec (Motion-learning based Wyner-Ziv video coding) • R. Martin et al., “Statistical motion learning for improved transform domain Wyner-Ziv video coding”, IET Image Processing, 2010 CMLab, CSIE, NTU
DISCOVER Video Codec Key WZ Key WZ Key WZ GOP 2 GOP 4 CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
16 8 0 0 8 0 0 0 32 32 32 8 8 16 4 8 0 4 0 0 0 0 0 0 16 8 8 8 4 0 0 4 0 0 0 0 0 0 0 0 0 8 4 4 0 0 0 0 0 0 0 0 Q1 0 0 4 0 0 0 0 0 0 0 0 0 Q2 Q4 Q3 32 16 8 4 64 16 8 8 64 32 16 8 128 64 32 16 16 8 4 4 16 8 8 4 32 16 8 4 64 32 16 8 8 4 4 0 8 8 4 4 16 8 4 4 32 16 8 4 4 4 0 0 8 4 4 0 8 4 4 0 16 8 4 0 Q5 Q6 Q7 Q8 Quantization 32 = 25 => use 5 bits 0 bits (不傳送) 8 = 23 => use 3 bits Eight quantization matrices CMLab, CSIE, NTU
S21 S31 S11 S22 S12 S32 S16 S36 S26 S27 S17 S37 S13 S23 S33 S15 S25 S35 S18 S38 S28 S213 S313 S113 S34 S24 S14 S29 S19 S39 S312 S212 S112 S214 S114 S314 S310 S210 S110 S111 S211 S311 S115 S315 S215 S216 S316 S116 Block1 Block3 Block2 Quantization • DCT coefficient band b1: { S11, S21, S31, …SN1 } DC band • DCT coefficient band b2: { S12, S22, S32, …SN2 } AC bands … • DCT coefficient band b16: { S116, S216, S316, …SN16 } DCT coefficient band CMLab, CSIE, NTU
32 16 8 4 16 8 4 0 8 4 0 0 4 0 0 0 Q4 Bit plane Extraction • For each DCT coefficient band… Bit planes of DC band: MSB 0 0 1 0 0 0 0 0 0 1 Bit plane 1: Bit plane 2: Channel Encode (LDPCA) Bit plane 3: Bit plane 4: 0 0 0 0 0 1 1 1 1 0 Bit plane 5: LSB 30 1 0 4 1 7 6 6 7 3 7 5 CMLab, CSIE, NTU
DISCOVER Video Codec 白育姍 Encoder X Joint Decoder • RX ≧H(X|Y) Channel Encoder Channel Decoder X Quantizer Source X P Virtual channel X’ Side information estimation • Dependency exists but is not exploited Y Encoder Y • RY ≧H(Y) Source Decoder Source Encoder Source Y Quantizer Y CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
Side Information Creation Divide frame to 16x16 non-overlapped blocks Motion estimation (search window: ±32) Low pass filter (3x3 Mean filter) XF XB CMLab, CSIE, NTU
Side Information Creation XF XB CMLab, CSIE, NTU
Side Information Creation XB XF (xu, yu) Adaptive search range: N (xRyR) (xL, yL) N N N (xB, yB) CMLab, CSIE, NTU
Side Information Creation XB XF Half pixel motion estimation CMLab, CSIE, NTU
Side Information Creation x3 x2 x1 x6 x5 x4 x9 x8 x7 XB XF Weighted vector median filter: Spatial motion smoothing CMLab, CSIE, NTU
Side Information Creation x2 MSE2 x1 MSE1 XB XF Weighted vector median filter: CMLab, CSIE, NTU
Side Information Creation x1 XB XF Weighted vector median filter: CMLab, CSIE, NTU
Side Information Creation The result of x6 is minimum xwvmf = x6 (Final motion vector ! ) x6 XB XF Weighted vector median filter: CMLab, CSIE, NTU
Side Information Creation XB XF Block interpolation ( 0.75*XB + 0.25*XF ) Bidirectional motion compensation CMLab, CSIE, NTU
DISCOVER Video Codec 白育姍 Laplacian Distribution CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
CNM Parameter Estimation XF XB R Residual frame generation: CMLab, CSIE, NTU
CNM Parameter Estimation z R T 120 258 35 -30 -24 -6 20 200 0.5 -40 10 5 Residual frame DCT transform : (4x4) CMLab, CSIE, NTU
CNM Parameter Estimation T 120 258 35 -30 -24 -6 20 200 0.5 -40 10 5 CNM parameter computation: CMLab, CSIE, NTU
DISCOVER Video Codec 白育姍 CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
Correlation Noise Distribution Modeling WZ Side information CNM parameter Laplacian distribution CMLab, CSIE, NTU
DISCOVER Video Codec 白育姍 CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
Prob. Conditional Bit Prob Computation WZ WZ WZ WZ 144/4 Laplacianpdf 176/4 X-Y 0011000 (24) 0011111 (31) Need to sum up 256 probabilities Assume quantization step size is 32 (31-24+1) x 32 = 256 : probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits CMLab, CSIE, NTU R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.
DISCOVER Video Codec 白育姍 CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
Reconstruction Bit planes of DC band: 7 4 6 1 Bit plane 1: 0 0 0 1 7 7 Channel decode (LDPCA) Bit plane 2: 0 0 0 1 Bit plane 3: 1 0 0 1 0 6 1 30 Bit plane 4: 0 0 0 1 5 3 Bit plane 5: 0 1 0 0 Zigzag order CMLab, CSIE, NTU
Reconstruction D. Kubasovet al., “Optimal reconstruction in Wyner–Ziv video coding with multiple side information”, IEEE workshop on MMSP, 2007 CMLab, CSIE, NTU
DISCOVER Video Codec 白育姍 Poor RD performance for high motion and large GOP size sequences CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
DISCOVER Video Codec 白育姍 Rooms for Improvement CMLab, CSIE, NTU Ref. X. Artigas et al., PCS, 2007
MLWZ Video Codec 白育姍 SI (Y) Search range SMF1=0.1 SMF2=0.02 SMF81=0.1 WZ (R) Normalize SMF: Update SMF: CMLab, CSIE, NTU Ref. R. Martin et al., IET Image Processing, 2010
MLWZ Video Codec SI 白育姍 Search range … … Side information re-estimation: CMLab, CSIE, NTU Ref. R. Martin et al., IET Image Processing, 2010
MLWZ Video Codec 白育姍 Correlation Noise Distribution Modeling: DCT coefficient SI Sum of Laplacian ! Laplacian parameter Laplacian distribution DCT coefficient of WZ CMLab, CSIE, NTU Ref. R. Martin et al., IET Image Processing, 2010
MLWZ Video Codec 白育姍 Improve RD performance in high motion and large GOP size sequences Rooms for Improvement CMLab, CSIE, NTU Ref. R. Martin et al., IET Image Processing, 2010
DISPAC Video Codec 白育姍 邱柏叡 Improve subjective quality Half-pixel motion estimation: Improve SI for motion learning Improve initial SI and motion learning 邱柏叡 For low motion parts Reduce decoding time and Improve RD performance For high motion parts CMLab, CSIE, NTU
DISPAC Video Codec 白育姍 邱柏叡 程瀚平 邱柏叡 CMLab, CSIE, NTU
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU
RD Performance of DISPAC Soccer Foreman Coastguard Hall Monitor Low High Motion Test sequences: QCIF, 15Hz, all frames (150 for Soccer, Foreman, Coastguard and 164 for Hall Monitor) GOP size: 2, 4, 8 Bitrate and PSNR: only luminance component CMLab, CSIE, NTU
RD Performance (GOP=2) CMLab, CSIE, NTU
RD Performance (GOP=4) CMLab, CSIE, NTU
RD Performance (GOP=8) 3.6 dB 3.1 dB 3.1 dB 1.6 dB 0.9 dB 0.2 dB 2.6 dB 2.6 dB CMLab, CSIE, NTU
Outline Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work CMLab, CSIE, NTU