1 / 38

Overview of the H.264/AVC Video Coding Standard

Overview of the H.264/AVC Video Coding Standard. T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra Overview of the H.264/AVC Video Coding Standard, IEEE Transactions on Circuits and Systems for Video Technology , Vol. 13, No. 7, pp. 560-576, July 2003. CMPT 820: Multimedia Systems.

terena
Download Presentation

Overview of the H.264/AVC Video Coding Standard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of the H.264/AVC Video Coding Standard T. Wiegand, G. Sullivan, G. Bjontegaard, and A. LuthraOverview of the H.264/AVC Video Coding Standard,IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003. CMPT 820: Multimedia Systems

  2. Outline • Overview • Network Abstraction Layer (NAL) • Video Coding Layer (VCL) • Profiles and Applications • Feature Highlights • Conclusions

  3. Evolution of Video Compression Standards MPEG ITU-T H.261 Video Telephony MPEG-1 Video-CD H.262/MPEG-2 Digital TV/DVD H.263 Video Conferencing MPEG-4 Visual Object-based Coding H.264 MPEG-4 AVC

  4. H.264/AVC Coding Standard • Various Applications • Broadcast: cable, satellites, terrestrial, and DSL • Storage: DVDs (HD DVD and Blu-ray) • Video Conferencing: over different networks • Multimedia Streaming: live and on-demand • Multimedia Messaging Services (MMS) • Challenge: • How to handle all these applications and networks • Flexibility and customizability

  5. Structure of H.264/AVC Codec Layered design • Network Abstraction Layer (NAL) • formats video and meta data for variety of networks • Video Coding Layer (VCL) • represents video in an efficient way Scope of H.264 standard

  6. Outline • Overview • Network Abstraction Layer (NAL) • Video Coding Layer (VCL) • Profiles and Applications • Feature Highlights • Conclusions

  7. Network Abstraction Layer • Provide network friendliness for sending video data over various network transports, such as: • RTP/IP for Internet applications • MPEG-2 streams for broadcast services • ISO File formats for storage applications • We present a few NAL concepts

  8. NAL Units • Packets consist of video data • short packet header: one byte • Support two types of transports • stream-oriented: no free unit boundaries  use a 3-byte start code prefix • packet-oriented: start code prefix is a waste • Can be classified into: • VCL units: data for video pictures • Non-VCL units: meta data and additional info

  9. Non-VCL NAL Units • Two types of non-VCL NAL units • Parameter sets: headers shared by a large number of VCL NAL units • a VCL NAL unit has a pointer to its picture parameter set • a picture parameter set points to its sequence parameter set • Supplemental enhancement info (SEI): optional info for higher-quality reconstruction and/or better application usability • Sent over in-band or out-of-band channels

  10. Access Units • A set of NAL units • Decoding an access unit results in one picture • Structure: • Delimiter: for seeking in a stream • SEI: timing and other info • primary coded picture: VCL • redundant coded picture: for error recovery

  11. Video Sequences and IDR Frames • Sequence: an independently decodable NAL unit stream  don’t need NALs from other sequences • with one sequence parameter set • starts with an instantaneous decoding refresh (IDR) access unit • IDR frames: random access points • Intra-coded frames • no subsequent picture of an IDR frame will require reference to pictures prior to the IDR frame • decoders mark buffered reference pictures unusable once seeing an IDR frame

  12. Outline • Overview • Network Abstraction Layer (NAL) • Video Coding Layer (VCL) • Profiles and Applications • Feature Highlights • Conclusions

  13. Video Coding Layer (VCL) • (Like other) hybrid video coding: H.264/AVC represents pictures in macroblocks • Motion compensation: temporal redundancy • Transform: spatial redundancy • Small improvements add up to huge gain • Combining many coding tools together • Pictures/Frames/Fields • Fields: top/bottom field contains even/odd rows • Interlaced: two fields were captured at diff time • Progressive: otherwise

  14. Macroblocks and Slices • Fixed size MBs: 16x16 for luma, and 8x8 for chroma • Slice: a set of MBs that can be decoded without use of other slices • I slice: intra-prediction (I-MB) • P slice: possibly one inter-prediction signal (I- and P-MBs) • B slice: up to two inter-prediction signals (I- and B-MBs) • SP slice: efficient switch among streams • SI slice: used in conjunction with SP slices

  15. Flexible Macroblock Ordering (FMO) • MBs in a slice: in raster-order • Slice group: more flexible • Each slice group contains one or several slices • Possible usages: • Region-of-interest (ROI) • Checker-board for video conferencing

  16. Adaptive Field Coding • Two fields of a frame can be coded as: • A single frame (frame mode) • Two separate fields (field mode) • A single frame with adaptive mode (mixed mode) • Picture-adaptive frame/field (PAFF) • frame/field decision is made at frame level • 16% - 20% bit rate reduction over frame only • Macroblock-adaptive frame/field (MBAFF) • frame/field decision is made at MB level • 14% - 16% bit rate reduction over PAFF  suitable for Interlaced and high motion [Ref] http://scien.stanford.edu/2005projects/ee398/projects/presentations/Guerrero Chan Tsang - Project Presentation - Fast Macroblock Adaptive Coding in H264.ppt

  17. Intra-frame Prediction • In spatial domain, using samples to the left and/or on above to predict samples in a MB • Types of intra-frame prediction: • Intra_4x4: detailed luma block • Intra_16x16: smooth luma blocks • Chroma_8x8: similar to Intra_16x16 as chroma components are smooth • I_PCM: bypass prediction/transform, send samples • anomalous pictures, loseless, and predictable bit rate

  18. Intra_4x4 Prediction • Samples in 4x4 block are predicted using 13 neighboring sample • 8 prediction mode: 1 DC and 8 directional • Sample D is used if E-H is not available

  19. Sample Intra_4x4 Prediction • Interpolation is used in some modes [Ref] Foreman sequence, http://www.vcodex.com/files/h264_intrapred.pdf

  20. Intra_16x16 Prediction • 4 modes • Vertical • Horizontal • DC • Planer (Diagonal)

  21. Inter-Prediction in P Slices • Two-level segmentation of MBs • Luma • MBs are divided into at most 4 partitions (as small as 8x8) • 8x8 partitions are divided into at most 4 partitions • Chroma – half size horizontally and vertically • Maximum of 16 motion vectors for each MB

  22. Examples of Segmentation [Ref] http://www.vcodex.com/files/h264_interpred.pdf

  23. A aa B C bb D F G H I J E a b c d e f g cc dd h i j k m ee ff n p q r K L M s N O P R S gg T hh U Inter-Prediction Accuracy • ¼-pixel for luma, 1/8-pixel for Chroma • Half-pixel samples: 6-tap FIR filter • Quarter-pixel samples: average of neighbors • Chroma predictions: bilinear interpolation

  24. Multiframe Inter-Prediction in P Slices • More than one prior reference pictures (by diff. MBs) • Encoders/decoders buffer the same reference pictures for inter-prediction • Reference index is used when coding MVs • MVs for regions smaller than 8x8 uses the same index for all MVs in the 8x8 region • P_skip mode: • Don’t send residual signals nor MVs nor reference index • Use buffered frame 0 as the reference picture • Use neighbor’s MVs • Large areas with no change or constant motion like slow panning can be represented with very few bits.

  25. Multiframe Inter-Prediction in B Slices • Weighted average of 2 predictions • B-slices can be used as reference • Two reference picture lists are used • One out of four pred. methods for each partition: • list 0 • list 1 • bi-predictive • direct prediction: inferred from prior MBs • The MB can be coded in B_skip mode (similar to P_skip)

  26. 4x4 Integer Transform • Why smaller transform: • Only use add and shift, an exact inverse transform is possible  no decoding mismatch • Not too much residue to code • Less noise around edge (ringing or mosquito noise) • Less computations and shorter data type (16-bit) • An approximation to 4x4 DCT: , where

  27. 2nd Transform and Quantization Parameter • 2nd Transform: Intra_16x16 and Chroma modes are for smooth area • DC coefficients are transformed again to cover the whole MB • Quantization step is adjusted by an exponential function of quantization parameter  to cover a wider range of QS • QP increases by 6 => QS doubles • QP increases by 1 => QS increases by 12% => bit rate decreases by 12%

  28. Entropy Coding • Non-transform coefficients: an infinite-extent codeword table • Transform coefficients: • Context-Adaptive Variable Length Coding (CAVLC) • several VLC tables are switched dep. on prior transmitted data  better than a single VLC table • Context-Adaptive Binary Arithmetic Coding (CABAC) • flexible symbol probability than CAVLC  5 – 15% rate reduction • efficient: multiplication free

  29. In-loop Deblocking Filter • Operate within coding loop • Use filtered frames as ref. frames  improve coding efficiency • Adaptive deblocking, need to determine • Blocking effects or object edges • Strong or weak deblocking • Intuitions • Large difference near a block edge -> likely a block artifact • If the difference is too large to be explained by the QP difference -> likely a real edge E.g., Filter p0 and q0 if

  30. Hypothetical Reference Decoder (HRD) • Standard receiver buffer models  encoders must produce bit streams that are decodable to HRD • Two buffers • Coded picture buffer (CPB) • models the bit arrival and removal time • Decoded picture buffer (DPB) • models the frame decoded and output time in reference frame lists

  31. Outline • Overview • Network Abstraction Layer (NAL) • Video Coding Layer (VCL) • Profiles and Applications • Feature Highlights • Conclusions

  32. Profiles and Applications • Defines a set of coding tools and algorithms • Conformance points for interoperability • 3 Profiles for different applications • Baseline – video conferencing • Main – broadcast, media storage, digital cinema • Extended – streaming over IP (wire/wireless) • 15 Levels • pic size • decoding rate (MB/s) • bit rate • buffer size [Ref] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. WediVideo coding with H.264/AVC: tools, performance and complexityIEEE Circuits and Systems Magazine 4(1) pp. 7 - 28 May 2004

  33. Outline • Overview • Network Abstraction Layer (NAL) • Video Coding Layer (VCL) • Profiles and Applications • Feature Highlights • Conclusions

  34. Feature Highlights -- Prediction • Variable blocksize MCs • Quarter-sample accurate MCs • MVs over pic. Boundaries • Multiple reference pictures • Weighted Bi-directional prediction • Decoupling of referencing from display orders • Decoupling prediction mode from reference capability (uses B frames as reference) • Improved Skip/Direct modes • Intra prediction in Spatial domain • In-loop deblocking filter

  35. Feature Highlights -- Transform • Small block-size transform • 2-level block transform (repeated DC transform) • Short data-type transform (16-bit) • Exact inverse transform • Context-adaptive entropy coding • Arithmetic entropy coding

  36. Feature Highlights -- Network • Parameter set structure (efficient) • NAL unit syntax structure (flexibility) • Flexible slice size • Flexible macroblock ordering (FMO) • Arbitrary slice ordering (ASO) • Redundant pictures • Data partitioning (unequal error protection) • SP/SI switching pictures

  37. Outline • Overview • Network Abstraction Layer (NAL) • Video Coding Layer (VCL) • Profiles and Applications • Feature Highlights • Conclusions

  38. Conclusions • Key improvements • Enhanced prediction (intra- and inter-) • Small block size exact match transform • Adaptive in-loop deblocking filter • Enhanced entropy coding method [Ref] G Sullivan and T. Wiegand, Video Compression—From Concepts to the H.264/AVC Standard, Proc. of IEEE, 93(1), Jan 2005

More Related