Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard Heiko Schwarz, Detlev Marpe, and Thomas Wiegand CSVT, Sept. 2007 MC2008, VCLAB

Outline • Introduction • Problems • Definition • Functionality • Goal • Competition • Applications • Targets • History of SVC • Structure of SVC • Temporal Scalability • Spatial Scalability • Quality Scalability • Combined Scalability • Profiles of SVC • Conclusions MC2008, VCLAB

Introduction - problem 8Mb/s 512Kb/s 1Mb/s 6Mb/s 4Mb/s • Non-Scalable Video Streaming • Multiple video streams are needed for heterogeneous clients MC2008, VCLAB

Introduction - definition High quality Sub-stream n Sub-stream ki … … reconstruction Sub-stream 2 Sub-stream k2 Low quality Sub-stream 1 Sub-stream k1 • Scalable video stream • Scalability • Removal of parts of the video bit-stream to adapt to the various needs of end users and to varying terminal capabilities or network conditions MC2008, VCLAB

Introduction - functionality • Functionality of SVC • Graceful degradation when “right” parts of the bit-stream are lost • Bit-rate adaptation to match the channel throughput • Format adaptation for backwards compatible extension • Power adaptation for trade-off between runtime and quality MC2008, VCLAB

Introduction - goal Sub-stream ki H.264/AVC bit-stream … = (Quality) Sub-stream k2 Sub-stream k1 • Goal of SVC • Scalability mode • Fidelity reduction (SNR scalability) • Picture size reduction (spatial scalability) • Frame rate reduction (temporal scalability) • Sharpness reduction (frequency scalability) • Selection of content (ROI or object-based scalability) MC2008, VCLAB

Introduction - competition • SVC is an old research topic (> 20 years) and has been included in H.262/MPEG-2, H.263, and MPEG-4 Visual. • Rarely used because • The characteristics of traditional video transmission systems • Significant loss of coding efficiency and large increase in decoder complexity • Competition • Simulcast • Transcoding MC2008, VCLAB

Introduction - applications • Applications • Heterogeneous clients • Unequal protection • Surveillance • Problems • Increased decoder complexity • Decreased coding efficiency • Temporal scalability is more often supported than spatial and quality scalability. MC2008, VCLAB

Introduction - targets • Targets • Little decrease in coding efficiency • Little increase in decoding complexity • Support of temporal, spatial, and quality scalability • A backward compatible base layer • Simple bit-stream adaptations after encoding MC2008, VCLAB

History of SVC • October 2003: MPEG issues a call for proposals of Scalable Video Coding • 12 wavelet-based • 2 extensions of H.264/AVC • ~October 2004: MSRA vs. HHI proposal (Wavelet-based vs. H.264 Extension) • October 2004: HHI proposal adopted as starting point (due to reduction of the encoder and decoder and improvements in coding efficiency) • January 2005: MPEG and VCEG agree to jointly finalize the SVC project as an Amendment of H.264/AVC • Spring 2007: Finalization MC2008, VCLAB

Structure of SVC SNR scalable coding Prediction Base layer coding Temporal scalable coding Multiplex Spatial decimation SNR scalable coding Temporal scalable coding Prediction Base layer coding MC2008, VCLAB

Outline • Introduction • History of SVC • Structure of SVC • Temporal Scalability • Hierarchical prediction structure • Spatial Scalability • Quality Scalability • Combined Scalability • Profiles of SVC • Conclusions MC2008, VCLAB

Temporal Scalability Hierarchical B pictures 0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9 GOP Non-dyadic hierarchical prediction 0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10 Hierarchical prediction with zero delay Hierarchical prediction structures MC2008, VCLAB 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Temporal Scalability • Combination with multiple reference picture • Arbitrary modification of the prediction structure • Issue of quantization • Lower layers with higher fidelity  Smaller QPs are used in lower layers • Propagation of quantization error  smaller QPs are used in higher layers MC2008, VCLAB

Temporal Scalability Video Coding Experiment with H.264/MPEG4-AVC Foreman, CIF 30Hz @ 1320kbps Performance as a function of N Cascaded QP assignment QP(P)  QP(B0)-3  QP(B1)-4  QP(B2)-5 N=1 I P P P P P P P P N=2 Temporal scalability I P B0 B0 P B0 P B0 P N=4 I P B0 B1 B0 B1 B1 B1 P N=8 MC2008, VCLAB I B2 B1 B0 B1 P B2 B2 B2 This slide is copied from JVT-W132-Talk

Temporal Scalability CIF • Coding efficiency of hierarchical prediction • JSVM11, High profile with CABAC • Only one reference frame MC2008, VCLAB

Temporal Scalability Compared with IPPP (With and without delay constraint) Providing temporal scalability usually doesn’t have any negative impact on coding efficiency MC2008, VCLAB

Outline • Introduction • History of SVC • Structure of SVC • Temporal Scalability • Spatial Scalability • Inter layer prediction • Inter layer motion prediction • Inter layer residual prediction • Inter layer intra prediction • Quality Scalability • Combined Scalability • Profiles of SVC • Conclusions MC2008, VCLAB

Spatial Scalability texture Hierarchical MCP & Intra-prediction Base layer coding motion • Inter-layer prediction • Intra • Motion • Residual Spatial decimation texture Hierarchical MCP & Intra-prediction Base layer coding Multiplex Scalable bit-stream motion • Inter-layer prediction • Intra • Motion • Residual Spatial decimation H.264/AVC compatible base layer bit-stream texture H.264/AVC MCP & Intra-prediction Base layer coding motion H.264/AVC compatible coder MC2008, VCLAB

Spatial Scalability Spatial 1 Temporal 2 Intra Spatial 0 Temporal 0 Temporal 1 Intra Similar to MPEG-2, H.263, and MPEG-4 Arbitrary resolution ratio The same coding order in all spatial layers Combination with temporal scalability Inter-layer prediction MC2008, VCLAB

Spatial Scalability • The prediction signals are formed by • MCP inside the enhancement layer (Temporal) (small motion and high spatial detail) • Up-sampling from the lower layer (Spatial) • Average of the above two predictions (Temporal + Spatial) • Inter-layer prediction • Three kinds of inter-layer prediction • Inter-layer motion prediction • Inter-layer residual prediction • Inter-layer intra prediction • Base mode MB • Only residual are transmitted, but no additional side info. MC2008, VCLAB

Spatial Scalability (2x2,2y2) (2x1,2y1) 16 16 (x2,y2) (x1,y1) Reference layer 8 8 • Inter-layer motion prediction • base_mode_flag = 1 • The reference layer is inter-coded • Data are derived from the reference layer • MB partitioning • Reference indices • MVs • motion_pred_flag • 1: MV predictors are obtained from the reference layer • 0: MV predictors are obtained by conventional spatial predictors. MC2008, VCLAB

Spatial Scalability • Inter-layer residual prediction • residual_pred_flag = 1 • Predictor • Block-wise up-sampling by a bi-linear filter from the corresponding 88 sub-MB in the reference layer • Transform block basis MC2008, VCLAB

Spatial Scalability • Inter-layer intra prediction • base_mode_flag = 1 • The reference layer is intra-coded • Up-sampling from the reference layer • Luma: one-dimensional 4-tap FIR filter • Chroma: bi-linear filter MC2008, VCLAB

Spatial Scalability • Past spatial scalable video: • Inter-layer intra prediction requires completely decoding of base layer. • Multiple motion compensation and deblocking filter are needed. • Full decoding + inter-layer prediction: complexity > simulcast. • Single-loop decoding • Inter-layer intra prediction is restricted to MBs for which the co-located base layer is intra-coded MC2008, VCLAB

Spatial Scalability Inter I B P Single-loop vs. multi-loop decoding MC2008, VCLAB This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf

Spatial Scalability • Generalized spatial scalability in SVC • Arbitrary ratio • Only restriction: Neither the horizontal nor the vertical resolution can decrease from one layer to the next. • Cropping • Containing new regions • Higher quality of interesting regions MC2008, VCLAB

Spatial Scalability • Coding efficiency • Multiple-loop > Single-loop MC2008, VCLAB

Spatial Scalability • Coding efficiency (IPPP…) • Multi-loop > Single-loop MC2008, VCLAB

Spatial Scalability • Encoder control (JSVM) • Base layer • p0 is optimized for base layer • Enhancement layer • p1is optimized for enhancement layer • Decisions of p1 depend on p0 • Efficient base layer coding but inefficient enhancement layer coding MC2008, VCLAB

Spatial Scalability • Encoder control (optimization) • Base layer • Considering enhancement layer coding • Eliminating p0’s disadvantaging enhancement layer coding • Enhancement layer • No change • w • w = 0: JSVM encoder control • w = 1: Single-loop encoder control (base layer is not controlled) MC2008, VCLAB

Spatial Scalability • Coding efficiency of optimal encoder control • Optimized encoder vs. JSVM encoder (QPE = QPB + 4) MC2008, VCLAB

Outline • Introduction • History of SVC • Structure of SVC • Temporal Scalability • Spatial Scalability • Quality Scalability • CGS • MGS • Drift control • Combined Scalability • Profiles of SVC • Conclusions MC2008, VCLAB

Quality Scalability • Coarse-grain quality scalability (CGS) • A special case of spatial scalability • Identical sizes (resolution) for base and enhancement layers • Smaller quantization step sizes for higher enhancement residual layers • Designed for only several selected bit-rate points • Supported bit-rate points = Number of layers • Switch can only occur at IDR access units MC2008, VCLAB

Quality Scalability • Medium-grain quality scalability (MGS) • More enhancement layers are supported • Refinement quality layers of residual • Key pictures • Drift control • Switch can occur at any access units • CGS + key pictures + refinement quality layers MC2008, VCLAB

Quality Scalability • Drift control • Drift: The effect caused by unsynchronized MCP at the encoder and decoder side • Trade-off of MCP in quality SVC • Coding efficiency  drift MC2008, VCLAB

Quality Scalability Refinement (possibly lost or truncated) Base layer • MPEG-4 quality scalability with FGS • Base layer is stored and used for MCP of following pictures • Drift: Drift free • Complexity: Low • Efficiency: Efficient based layer but inefficient enhancement layer • Refinement data are not used for MCP MC2008, VCLAB

Quality Scalability Refinement (possibly lost or truncated) Base layer • MPEG-2 quality scalability (without FGS) • Only 1 reference picture is stored and used for MCP of following pictures • Drift: Both base layer and enhancement layer • Frequent intra updates is necessary • Complexity: Low • Efficiency: Efficient enhancement layer but inefficient base layer MC2008, VCLAB

Quality Scalability Refinement (possibly lost or truncated) Base layer • 2-loop prediction • Several closed encoder loops run at different bit-rate points in a layered structure • Drift: Enhancement layer • Complexity: High • Efficiency: Efficientbase layer and medium efficient enhancement layer MC2008, VCLAB

Quality Scalability Refinement (possibly lost or truncated) Base layer • SVC concepts • Key picture • Trade-off between coding efficiency and drift • MPEG-4 FGS: All key pictures • MPEG-2 quality scalability: Non-key pictures MC2008, VCLAB

Quality Scalability Refinement (possibly lost or truncated) Base layer P B2 B1 B2 P B2 B1 B2 P • Drift control with hierarchical prediction • Key pictures • Based layer is stored and used for the MCP of following pictures • Other pictures • Enhancement layer is stored and used for the MCP of following pictures • GOP size adjusts the trade-off between enhancement layer coding efficiency and drift MC2008, VCLAB

Quality Scalability High efficiency Low efficiency Drift-free Drift Comparisons of drift control MC2008, VCLAB

Quality Scalability QSTEP = 2 (QP-4)/6 High dQP Low dQP Comparisons of coding efficiency MC2008, VCLAB

Quality Scalability Only base layer MGS with key pictures using optimized encoder control MC2008, VCLAB

Outline • Introduction • History of SVC • Structure of SVC • Temporal Scalability • Spatial Scalability • Quality Scalability • Combined Scalability • SVC encoder structure • Dependence and Quality refinement layers • Bit-stream format • Bit-stream switching • Profiles of SVC • Conclusions MC2008, VCLAB

Combined Scalability The same motion/prediction information Dependency layer Temporal Decomposition The same motion/prediction information SVC encoder structure MC2008, VCLAB

Combined Scalability Q = 2 D = 2 Q = 1 Q = 0 Q = 2 Scalable bit-stream D = 1 Q = 1 Q = 0 Q = 2 D = 0 Q = 1 Q = 0 Dependency and Quality refinement layers MC2008, VCLAB

Combined Scalability Q1 D1 Q0 T0 T2 T1 T2 T0 Q1 D0 Q0 MC2008, VCLAB

Combined Scalability NAL unit header NAL unit header extension NAL unit payload 2 6 3 3 2 1 1 1 1 1 3 P T D Q P (priority_id): indicates the importance of a NAL unit T (temporal_id): indicates temporal level D (dependency_id): indicates spatial/CGS layer Q (quality_id): indicates MGS/FGS layer Bit-stream format MC2008, VCLAB

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard