Overview of the H.264/AVC Video Coding Standard

Overview of the H.264/AVC Video Coding Standard T. Wiegand, G. Sullivan, G. Bjontegaard, and A. LuthraOverview of the H.264/AVC Video Coding Standard,IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003. CMPT 820: Multimedia Systems

Outline • Overview • Network Abstraction Layer (NAL) • Video Coding Layer (VCL) • Profiles and Applications • Feature Highlights • Conclusions

Evolution of Video Compression Standards MPEG ITU-T H.261 Video Telephony MPEG-1 Video-CD H.262/MPEG-2 Digital TV/DVD H.263 Video Conferencing MPEG-4 Visual Object-based Coding H.264 MPEG-4 AVC

H.264/AVC Coding Standard • Various Applications • Broadcast: cable, satellites, terrestrial, and DSL • Storage: DVDs (HD DVD and Blu-ray) • Video Conferencing: over different networks • Multimedia Streaming: live and on-demand • Multimedia Messaging Services (MMS) • Challenge: • How to handle all these applications and networks • Flexibility and customizability

Structure of H.264/AVC Codec Layered design • Network Abstraction Layer (NAL) • formats video and meta data for variety of networks • Video Coding Layer (VCL) • represents video in an efficient way Scope of H.264 standard

Network Abstraction Layer • Provide network friendliness for sending video data over various network transports, such as: • RTP/IP for Internet applications • MPEG-2 streams for broadcast services • ISO File formats for storage applications • We present a few NAL concepts

NAL Units • Packets consist of video data • short packet header: one byte • Support two types of transports • stream-oriented: no free unit boundaries  use a 3-byte start code prefix • packet-oriented: start code prefix is a waste • Can be classified into: • VCL units: data for video pictures • Non-VCL units: meta data and additional info

Non-VCL NAL Units • Two types of non-VCL NAL units • Parameter sets: headers shared by a large number of VCL NAL units • a VCL NAL unit has a pointer to its picture parameter set • a picture parameter set points to its sequence parameter set • Supplemental enhancement info (SEI): optional info for higher-quality reconstruction and/or better application usability • Sent over in-band or out-of-band channels

Access Units • A set of NAL units • Decoding an access unit results in one picture • Structure: • Delimiter: for seeking in a stream • SEI: timing and other info • primary coded picture: VCL • redundant coded picture: for error recovery

Video Sequences and IDR Frames • Sequence: an independently decodable NAL unit stream  don’t need NALs from other sequences • with one sequence parameter set • starts with an instantaneous decoding refresh (IDR) access unit • IDR frames: random access points • Intra-coded frames • no subsequent picture of an IDR frame will require reference to pictures prior to the IDR frame • decoders mark buffered reference pictures unusable once seeing an IDR frame

Video Coding Layer (VCL) • (Like other) hybrid video coding: H.264/AVC represents pictures in macroblocks • Motion compensation: temporal redundancy • Transform: spatial redundancy • Small improvements add up to huge gain • Combining many coding tools together • Pictures/Frames/Fields • Fields: top/bottom field contains even/odd rows • Interlaced: two fields were captured at diff time • Progressive: otherwise

Macroblocks and Slices • Fixed size MBs: 16x16 for luma, and 8x8 for chroma • Slice: a set of MBs that can be decoded without use of other slices • I slice: intra-prediction (I-MB) • P slice: possibly one inter-prediction signal (I- and P-MBs) • B slice: up to two inter-prediction signals (I- and B-MBs) • SP slice: efficient switch among streams • SI slice: used in conjunction with SP slices

Flexible Macroblock Ordering (FMO) • MBs in a slice: in raster-order • Slice group: more flexible • Each slice group contains one or several slices • Possible usages: • Region-of-interest (ROI) • Checker-board for video conferencing

Adaptive Field Coding • Two fields of a frame can be coded as: • A single frame (frame mode) • Two separate fields (field mode) • A single frame with adaptive mode (mixed mode) • Picture-adaptive frame/field (PAFF) • frame/field decision is made at frame level • 16% - 20% bit rate reduction over frame only • Macroblock-adaptive frame/field (MBAFF) • frame/field decision is made at MB level • 14% - 16% bit rate reduction over PAFF  suitable for Interlaced and high motion [Ref] http://scien.stanford.edu/2005projects/ee398/projects/presentations/Guerrero Chan Tsang - Project Presentation - Fast Macroblock Adaptive Coding in H264.ppt

Intra-frame Prediction • In spatial domain, using samples to the left and/or on above to predict samples in a MB • Types of intra-frame prediction: • Intra_4x4: detailed luma block • Intra_16x16: smooth luma blocks • Chroma_8x8: similar to Intra_16x16 as chroma components are smooth • I_PCM: bypass prediction/transform, send samples • anomalous pictures, loseless, and predictable bit rate

Intra_4x4 Prediction • Samples in 4x4 block are predicted using 13 neighboring sample • 8 prediction mode: 1 DC and 8 directional • Sample D is used if E-H is not available

Sample Intra_4x4 Prediction • Interpolation is used in some modes [Ref] Foreman sequence, http://www.vcodex.com/files/h264_intrapred.pdf

Intra_16x16 Prediction • 4 modes • Vertical • Horizontal • DC • Planer (Diagonal)

Inter-Prediction in P Slices • Two-level segmentation of MBs • Luma • MBs are divided into at most 4 partitions (as small as 8x8) • 8x8 partitions are divided into at most 4 partitions • Chroma – half size horizontally and vertically • Maximum of 16 motion vectors for each MB

Examples of Segmentation [Ref] http://www.vcodex.com/files/h264_interpred.pdf

A aa B C bb D F G H I J E a b c d e f g cc dd h i j k m ee ff n p q r K L M s N O P R S gg T hh U Inter-Prediction Accuracy • ¼-pixel for luma, 1/8-pixel for Chroma • Half-pixel samples: 6-tap FIR filter • Quarter-pixel samples: average of neighbors • Chroma predictions: bilinear interpolation

Multiframe Inter-Prediction in P Slices • More than one prior reference pictures (by diff. MBs) • Encoders/decoders buffer the same reference pictures for inter-prediction • Reference index is used when coding MVs • MVs for regions smaller than 8x8 uses the same index for all MVs in the 8x8 region • P_skip mode: • Don’t send residual signals nor MVs nor reference index • Use buffered frame 0 as the reference picture • Use neighbor’s MVs • Large areas with no change or constant motion like slow panning can be represented with very few bits.

Multiframe Inter-Prediction in B Slices • Weighted average of 2 predictions • B-slices can be used as reference • Two reference picture lists are used • One out of four pred. methods for each partition: • list 0 • list 1 • bi-predictive • direct prediction: inferred from prior MBs • The MB can be coded in B_skip mode (similar to P_skip)

4x4 Integer Transform • Why smaller transform: • Only use add and shift, an exact inverse transform is possible  no decoding mismatch • Not too much residue to code • Less noise around edge (ringing or mosquito noise) • Less computations and shorter data type (16-bit) • An approximation to 4x4 DCT: , where

2nd Transform and Quantization Parameter • 2nd Transform: Intra_16x16 and Chroma modes are for smooth area • DC coefficients are transformed again to cover the whole MB • Quantization step is adjusted by an exponential function of quantization parameter  to cover a wider range of QS • QP increases by 6 => QS doubles • QP increases by 1 => QS increases by 12% => bit rate decreases by 12%

Entropy Coding • Non-transform coefficients: an infinite-extent codeword table • Transform coefficients: • Context-Adaptive Variable Length Coding (CAVLC) • several VLC tables are switched dep. on prior transmitted data  better than a single VLC table • Context-Adaptive Binary Arithmetic Coding (CABAC) • flexible symbol probability than CAVLC  5 – 15% rate reduction • efficient: multiplication free

In-loop Deblocking Filter • Operate within coding loop • Use filtered frames as ref. frames  improve coding efficiency • Adaptive deblocking, need to determine • Blocking effects or object edges • Strong or weak deblocking • Intuitions • Large difference near a block edge -> likely a block artifact • If the difference is too large to be explained by the QP difference -> likely a real edge E.g., Filter p0 and q0 if

Hypothetical Reference Decoder (HRD) • Standard receiver buffer models  encoders must produce bit streams that are decodable to HRD • Two buffers • Coded picture buffer (CPB) • models the bit arrival and removal time • Decoded picture buffer (DPB) • models the frame decoded and output time in reference frame lists

Profiles and Applications • Defines a set of coding tools and algorithms • Conformance points for interoperability • 3 Profiles for different applications • Baseline – video conferencing • Main – broadcast, media storage, digital cinema • Extended – streaming over IP (wire/wireless) • 15 Levels • pic size • decoding rate (MB/s) • bit rate • buffer size [Ref] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. WediVideo coding with H.264/AVC: tools, performance and complexityIEEE Circuits and Systems Magazine 4(1) pp. 7 - 28 May 2004

Feature Highlights -- Prediction • Variable blocksize MCs • Quarter-sample accurate MCs • MVs over pic. Boundaries • Multiple reference pictures • Weighted Bi-directional prediction • Decoupling of referencing from display orders • Decoupling prediction mode from reference capability (uses B frames as reference) • Improved Skip/Direct modes • Intra prediction in Spatial domain • In-loop deblocking filter

Feature Highlights -- Transform • Small block-size transform • 2-level block transform (repeated DC transform) • Short data-type transform (16-bit) • Exact inverse transform • Context-adaptive entropy coding • Arithmetic entropy coding

Feature Highlights -- Network • Parameter set structure (efficient) • NAL unit syntax structure (flexibility) • Flexible slice size • Flexible macroblock ordering (FMO) • Arbitrary slice ordering (ASO) • Redundant pictures • Data partitioning (unequal error protection) • SP/SI switching pictures

Conclusions • Key improvements • Enhanced prediction (intra- and inter-) • Small block size exact match transform • Adaptive in-loop deblocking filter • Enhanced entropy coding method [Ref] G Sullivan and T. Wiegand, Video Compression—From Concepts to the H.264/AVC Standard, Proc. of IEEE, 93(1), Jan 2005

Overview of the H.264/AVC Video Coding Standard