1.38k likes | 1.55k Views
Overview of H.264 / MPEG-4 Part10. 2004. 10. 20. Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington. Contents. Introduction Layered Structure Video Coding Algorithm Error Resilience Comparison of Coding Efficiency Conclusions.
E N D
Overview of H.264 /MPEG-4 Part10 2004. 10. 20. Soon-kak Kwon, A. Tamhankar, K. R. Rao Dongeui University, T-Mobile, University of Texas at Arlington
Contents • Introduction • Layered Structure • Video Coding Algorithm • Error Resilience • Comparison of Coding Efficiency • Conclusions
Introduction • Scope of Image and Video Coding Standards • Only the Syntax and Decoder are standardized: • Optimization beyond the obvious • Complexity reduction for implementation • Provides no guarantees of quality Input (image / video) Pre-Processing Encoding Output (image / video) Post-Processing & Error Recovery Decoding Scope of Standard
Introduction • Video Coding Standards Standard Main Applications Year JPEG, JPEG2000 Image 1992-1999, 2000 JBIG Fax 1995-2000 H.261 Video Conferencing 1990 H.262, H.262+ DTV, SDTV 1995, 2000 H.263, H.263++ Videophone 1998, 2000 MPEG-1 Video CD 1992 MPEG-2 DTV, SDTV, HDTV, DVD 1995 MPEG-4 Interactive video 2000 Multimedia Content description Interface MPEG-7 2001 MPEG-21 Multimedia Framework 2002 Advanced Video Coding 2003 H.264/MPEG-4 part 10 Fidelity Range Extensions (High profile), Studio editing, Post processing, Digital cinema 2004 August
Introduction • MPEG-1 • Formally ISO/IEC 11172-2 (’93), developed by ISO/IEC JTC1 SC29 WG11 (MPEG) – use is fairly widespread, but mostly overtaken by MPEG-2 • Superior quality compared to H.261 when operated at higher bit rates ( 1Mbps for CIF 352x288 resolution) • Provides approximately VHS quality between 1-2Mbps using SIF 352x240/288 resolution • Additional technical features : • Bi-directional motion prediction (B-pictures) • Half-pel motion vector resolution • Slice-structured coding • DC-only “D” pictures
Introduction • Predictive Coding with B Pictures I B P B P
Introduction • MPEG-2 / H.262 • Formally ISO/IEC 13818-2 & ITU-T H.262, developed (1994) jointly by ITU-T and ISO/IEC SC29 WG11 (MPEG) – Now in wide use for DVD and standard & high-definition DTV (the most commonly used video coding standard) • Primary new technical features: • Support for interlaced-scan pictures • Also • Various forms of scalability (SNR, Spatial, Temporal and hybrid) • I-picture concealment motion vectors • Essentially same as MPEG-1 for progressive-scan pictures, and MPEG-1 forward compatibility is required • Not especially useful below 2-3Mbps (range ~2-5Mbps SDTV broadcast, 6-8Mbps DVD, 18Mbps HDTV), picture skipping not easy
Introduction • H.263 : The Next Generation • ITU-T Rec. H.263 (v1: 1995): The next generation of video coding performance, developed by ITU-T – the current premier ITU-T video standard (has overtaken H.261 as dominant videoconferencing codec) • Superior quality to prior standards at all bit rates (except perhaps for interlaced video) • Wins by a factor of two at very low rates • Version 2 (late 1997 / early 1998) & version 3 (2000) later developed with a large number of new features • Profiles defined early 2001 • H.263+ & H.263++ (Extensions to H.263)
Introduction • MPEG-4 Visual : Baseline H.263 and Many Creative Extras • MPEG-4 Visual (formally 14496-2, v1: early 1999): Contains the H.263 baseline design and adds essentially all prior features and many creative new extras: • Segmented coding of shapes • Scalable wavelet coding of still textures • Mesh coding • Face animation coding • Coding of synthetic and semi-synthetic content • 10 & 12-bit sampling • More … • v2 (early 2000) & v3 (early 2001) added later
Introduction • Relationship to Other Standards • Same design to be approved in both ITU-T / VCEG and ISO/IEC / MPEG • In ITU-T / VCEG this is a new & separate standard • ITU-T Recommendation H.264 • ITU-T Systems (H.32x) is modified to support it • In ISO/IEC / MPEG this is a new “part” in the MPEG-4 suite • Separate coded design from prior MPEG-4 visual (Part 2) • New part 10 called “Advanced Video Coding” (AVC – similar to “AAC” MPEG-2 as separate audio codec) • Not backward or forward compatible with prior standards • MPEG-4 Systems / File Format modifying to support it • H.222.0 | MPEG-2 Systems are also be modified to support it • IETF working on RTP payload packetization
Introduction • History of H.264 / MPEG-4 part 10 • ITU-T Q.6/SG16 started work on H.26L (L: Long Range) • July 2001: H.26L demonstrated at MPEG (Moving Picture Experts Group) call for technology • December 2001: ITU-T VCEG (Video Coding Experts Group) and ISO/IEC MPEG started a joint project – Joint Video Team (JVT) • May 2003: Final approval from ISO/IEC and ITU-T • The standard is named H.264 by ITU-T and MPEG-4 part 10 by ISO/IEC • Fidelity Range Extensions (August 2004) Amendment 1 • Transport of MPEG-4 AVC on MPEG-2 TS Ammendment 3
Introduction • Purpose of H.264 / MPEG-4 part 10 • Higher coding efficiency than previous standards, MPEG-1,2,4 part 2, H.261, H.263 • Simple syntax specifications • Seamless integration of video coding into all current protocols • More error robustness • Various applications like video broadcasting, video streaming, video conferencing, D-Cinema, HDTV • Network friendliness • Balance between coding efficiency, implementation complexity and cost - based on state-of the-art in VLSI design technolgy
Introduction • H.264 / MPEG-4 part 10Architecture
Introduction • Applications of H.264 / MPEG-4 part 10 : A Broad range of applications for video content including but not limited to the following: Video Streaming over the internet • CATV Cable TV on optical networks, copper, etc. • DBS Direct broadcast satellite video services • DSL Digital subscriber line video services • DTTB Digital terrestrial television broadcasting, cable modem, DSL • ISM Interactive storage media (optical disks, etc.) • MMM Multimedia mailing • MSPN Multimedia services over packet networks • RTC Real-time conversational services (videoconferencing, videophone, etc.) • RVS Remote video surveillance • SSM Serial storage media (digital VTR, etc.) • D Cinema Content contribution, content distribution, studio editing, post processing
Introduction • Profiles and Levels for particular applications • Profile : a subset of entire bit stream of syntax, different decoder design based on the Profile • Four profiles : Baseline, Main, Extended and High Profile Applications Baseline Video Conferencing Videophone Main Digital Storage Media Television Broadcasting Streaming Video Extended High Content contribution Content distribution Studio editing Post processing
Introduction • Specific coding parts for the Profiles
Introduction • Common coding parts for the Profiles • I slice (Intra-coded slice) : the coded slice by using prediction only from decoded samples within the same slice • P slice (Predictive-coded slice) : the coded slice by using inter prediction from previously-decoded reference pictures, using at most one motion vector and reference index to predict the sample values of each block • CAVLC (Context-based Adaptive Variable Length Coding) for entropy coding
Introduction • Coding parts for Baseline Profile • Common parts : I slice, P slice, CAVLC • FMO Flexible macroblock order : macroblocks may not necessarily be in the raster scan order. The map assigns macroblocks to a slice group • ASO Arbitrary slice order : the macroblock address of the first macroblock of a slice of a picture may be smaller than the macroblock address of the first macroblock of some other preceding slice of the same coded picture • RS Redundant slice : This slice belongs to the redundant coded data obtained by same or different coding rate, in comparison with previous coded data of same slice
Introduction • Coding parts for Main Profile • Common parts : I slice, P slice, CAVLC • B slice (Bi-directionally predictive-coded slice) : the coded slice by using inter prediction from previously-decoded reference pictures, using at most two motion vectors and reference indices to predict the sample values of each block • Weighted prediction : scaling operation by applying a weighting factor to the samples of motion-compensated prediction data in P or B slice • CABAC (Context-based Adaptive Binary Arithmetic Coding) for entropy coding
Introduction • Coding parts for Extended Profile • Common parts : I slice, P slice, CAVLC • SP slice : the specially coded slice for efficient switching between video streams, similar to coding of a P slice • SI slice : the switched slice, similar to coding of an I slice • Data partition : the coded data is placed in separate data partitions, each partition can be placed in different layer unit • Flexible macroblock order (FMO) • Arbitrary slice order (ASO) • Redundant slice (RS) • B slice • Weighted prediction
Introduction High Baseline Main Extended • Profile specifications I & P Slices X X X X Deblocking Filter X X X X ¼ Pel Motion Compensation X X X X Variable Block Size (16x16 to 4x4) X X X X CAVLC/UVLC X X X X Error Resilience Tools – Flexible MB Order, ASO, Red. Slices X X SP/SI Slices X X B Slice X X X Interlaced Coding X X X CABAC X X Data Partitioning
Application Requirements H.264 Profiles MPEG-4 Profiles Broadcast television Coding efficiency, reliability (over a controlled distribution channel), interlace, low-complexity decoder Main ASP (Advanced Simple) Streaming video Coding efficiency, reliability (over a uncontrolled packet-based network channel), scalability Extended ARTS (Advanced Real Time Simple) or FGS (Fine Granular Scalability) Video storage and playback Coding efficiency, interlace, low-complexity encoder and decoder Main ASP Videoconferencing Coding efficiency, reliability, low latency, low-complexity encoder and decoder Baseline SP (Simple) Mobile video Coding efficiency, reliability, low latency, low-complexity encoder and decoder, low power consumption Baseline SP Studio distribution Lossless or near-lossless, interlace, efficient transcoding Main High Studio Profile Introduction • Application requirements
Level number Picture type & frame rate 1 QCIF @ 15fps 1.1 QCIF @ 30fps 1.2 CIF @ 15fps 1.3 CIF @ 30fps 2 CIF @ 30fps 2.1 HHR @15 or 30fps 2.2 SDTV @ 15fps 3 SDTV: 720x480x30i,720x576x25i 10Mbps(max) 3.1 1280x720x30p 3.2 1280x720x60p 4 HDTV: 1920x1080x30i, 1280x720x60p, 2Kx1Kx30p 20Mbps(max) 4.1 HDTV: 1920x1080x30i, 1280x720x60p, 2Kx1Kx30p 50Mbps(max) 4.2 HDTV: 1920x1080x60i, 2Kx1Kx60p 5 SHDTV/D-Cinema: 2.5Kx2Kx30p 5.1 SHDTV/D-Cinema: 4Kx2Kx30p Introduction • Level : corresponding to processing power and memory capability of a codec
Level number Max macroblock processing rate (MB/s) Max frame size (MBs) Max decoded picture buffer size (1024 bytes) Max videobit rate (1000 bits/s or 1200 bits/s) MaxCPB size (1000 bits or 1200 bits) Vertical MV component range (luma frame samples) Min compression ratio Max number of MVs per two consecutive MBs 1 1 485 99 148.5 64 175 [-64,+63.75] 2 - 1.1 3 000 396 337.5 192 500 [-128,+127.75] 2 - 1.2 6 000 396 891.0 384 1 000 [-128,+127.75] 2 - 1.3 11 880 396 891.0 768 2 000 [-128,+127.75] 2 - 2 11 880 396 891.0 2 000 2 000 [-128,+127.75] 2 - 2.1 19 800 792 1 782.0 4 000 4 000 [-256,+255.75] 2 - 2.2 20 250 1 620 3 037.5 4 000 4 000 [-256,+255.75] 2 - 3 40 500 1 620 3 037.5 10 000 10 000 [-256,+255.75] 2 32 3.1 108 000 3 600 6 750.0 14 000 14 000 [-512,+511.75] 4 16 3.2 216 000 5 120 7 680.0 20 000 20 000 [-512,+511.75] 4 16 4 245 760 8 192 12 288.0 20 000 25 000 [-512,+511.75] 4 16 4.1 245 760 8 192 12 288.0 50 000 62 500 [-512,+511.75] 2 16 4.2 491 520 8 192 12 288.0 50 000 62 500 [-512,+511.75] 2 16 5 589 824 22 080 41 310.0 135 000 135 000 [-512,+511.75] 2 16 5.1 983 040 36 864 69 120.0 240 000 240 000 [-512,+511.75] 2 16 Introduction • Parameter set limits for each Level
Layered Structure • Two Layers : Network Abstraction Layer (NAL), Video Coding Layer (VCL) • NAL • Abstracts the VCL data – hence the name Network ‘Abstraction’ Layer • Header information about the VCL format • Appropriate for conveyance by the transport layers or storage media • NAL unit (NALU) defines a generic format for use in both packet based and bit-streaming systems • VCL • Core coding layer • Concentrates on attaining maximum coding efficiency
Layered Structure • Elements of VCL
176 1 88 1 72 lines Cr Cb Y 90 pels 4 2 352 4 2 288 lines 144 lines 2 2 1 176 176 88 2 2 1 144 lines 72 lines 144 lines 180 pels 360 pels 180 pels 90 pels 180 pels Layered Structure • Supporting picture format : 4:2:0 chroma sampling CIF Format QCIF format
Bitstream Output Video Input + Entropy Coding Transform & Quantization - Inverse Quantization & Inverse Transform + + Intra/Inter Mode Decision Intra Prediction Motion Compensation Deblocking Filter Picture Buffering Motion Estimation Video Coding Algorithm • Block diagram for H.264 encoder
Bitstream Input + Video Output Inverse Quantization & Inverse Transform Deblocking Filter Entropy Decoding + Intra/Inter Mode Selection Picture Buffering Intra Prediction Motion Compensation Video Coding Algorithm • Block diagram for H.264 Decoder
M A B C D E F G H M A B C D E F G H I a b c d I a b c d mode 1 J e f g h J e f g h K i j k l mode 6 K i j k l mode 8 L m n o p L m n o p mode 3 mode 7 mode 4 mode 0 mode 5 VC Algorithm : Intra Prediction • Exploits Spatial redundancy between adjacent macroblocks in a frame • 4 x 4 luma block • 9 prediction modes : 8 Directional predictions and 1 DC prediction (vertical : 0, horizontal : 1, DC : 2, diagonal down left : 3, diagonal down right : 4, vertical right : 5, horizontal down : 6, vertical left : 7, horizontal up : 8) samples a, b, …, p : the predicted ones for the current block, above and left samples A, B, …, M : previously reconstructed ones
M A B C D E F G H M A B C D E F G H I a b c d I a b c d J e f g h J e f g h K i j k l K i j k l L m n o p L m n o p mode 8 mode 4 VC Algorithm : Intra Prediction • Example of 4 x 4 luma block • Sample a, d : predicted by round(I/4 + M/2 + A/4), round(B/4 + C/2 + D/4)for mode 4 • Sample a, d : predicted by round(I/2 + J/2), round(J/4 + K/2 + L/4)for mode 8
VC Algorithm : Intra Prediction • 16 x 16 luma • 4 prediction modes (vertical : 0, horizontal : 1, DC : 2, plane : 3) Plane: works well in smoothly varying luminance. A linear ‘plane’ function is fitted to the upper (H) and left side (V) samples (8x8) luma (FRExt only) similar to 4x4 luma with low pass filtering of the predictor to improve prediction performance Plane
VC Algorithm Intra Prediction • Chroma always operates using full MB prediction (8x8) 4:2:0 Format (8x16) 4:2:2 (16x16) 4:4:4 (Similar to 16x16 luma block but different mode order) 4 Prediction modes (DC: 0, Horizontal: 1, Vertical: 2, Plane: 3)
VC Algorithm : Inter Prediction • Exploits temporal redundancy • Prediction of variable block sizes • Sub-pel motion compensation • Deblocking filter • Management of multiple reference pictures
VC Algorithm : Inter Prediction • Prediction of variable block size • A MB can be partitioned into smaller block sizes • 4 cases for 16 x 16 MB, 4 cases for 8 x 8 Sub-MB • Large partition size : homogeneous areas, small : detailed areas Cannot mix the two partitions .i.e. cannot have 16x8 and 4x8 partitions When sub-MB partition (8x8) is selected, the (8x8) block can be further partitioned
Bitstream Output Video Input + Entropy Coding Transform & Quantization - Inverse Quantization & Inverse Transform + + Intra/Inter Mode Decision • motion vector accuracy 1/4 (6 tap filter) Intra Prediction Motion Compensation Deblocking Filtering Picture Buffering 8x16 16x8 16x16 8x8 0 0 0 1 1 0 0 1 1 0 0 MB Motion Estimation 2 2 3 3 1 1 4x8 8x4 8x8 4x4 Sub MB 0 VC Algorithm : Inter Prediction • Sub-pel motion compensation • Better compression performance than integer-pel MC • Expense of increased complexity • Outperforms at high bit rates and high resolutions
VC Algorithm : Inter Prediction • Sub-pel accuracy A distinct MV can be sent for each sub-MB partition. ME can be based on multiple pictures that lie in the past or in the future in display order. Reference picture for ME is selected at the MB partition level. Sub-MB partitions within the same MB partition must use the same reference picture.
VC Algorithm : Inter Prediction • Half-pel : interpolated from neighboring integer-pel samples using a 6-tap Finite Impulse Response filter with weights (1, -5, 20, 20, -5, 1)/32 • Quarter-pel : produced using bilinear interpolation between neighboring half- or integer-pel samples b = round((E-5F+20G+20H-5I+J)/32) a = round((G+b)/2)
VC Algorithm : Inter Prediction • Deblocking filter Adaptive • To reduce the blocking artifacts in the block boundary and prevent the propagation of accumulated coded noise • Filtering is applied to horizontal or vertical edges of 4 x 4 blocks in a macroblock, adaptively on the several levels (slice, block-edge, sample)
management of multiple reference pictures • (short term, long term) VC Algorithm : Inter Prediction • Management of multiple reference pictures • To take care of marking some stored pictures as ‘unused’ and deciding which pictures to delete from the buffer Bitstream Output Video Input + Entropy Coding Transform & Quantization - Inverse Quantization & Inverse Transform + + Intra/Inter Mode Decision Intra Prediction Motion Compensation Deblocking Filtering Picture Buffering Motion Estimation
VC Algorithm : Transform & Quantization • Transform • Integer transform, multiplier free : additions and shifts in 16-bit arithmetic • Hierarchical structure : 4 x 4 Integer DCT + Hadamard transform Assignment of the indices of DC (dark samples) to luma 4 x 4 block, the numbers 0, 1, …, 15 are the coding order for (4x4) integer DCT transform (0,0), (0,1), (0,2), …, (3,3) are DC coefficients of each 4x4 block Hadamard transform is applied only when (16x16) intra prediction mode is used with (4x4) IntDCT. Similarly for the chroma: MB size for chroma depends on 4:2:0, 4:2:2 and 4:4:4 formats
Implies element by element multiplication VC Algorithm : Transform • 4 x 4 integer DCT • X : input pixels, Y : output coefficients Y=(Cf x CfT) Ef
X = CiT (Y Ei) Ci 4x4 Inverse IntDCT Here In both forward and inverse transforms QP (Quantization step) is embedded in matrices Ef and Ei
VC Algorithm : Transform • Luma DC coefficients for Intra 16x16 MB • 16 DC coefficients of 16 (4x4) blocks are transformed using Walsh Hadamard transform YD= where // = rounding to the nearest integer
YD= VC Algorithm : Transform • Chroma DC coefficients Intra pediction mode (4x4) IntDCT • Walsh Hadamard transform : 2 x 2 DC coefficients , 4:2:0 U V 16 17 2x2 DC 22 19 23 18 AC 20 21 24 25 For 4:2:2 and 4:4:4 chroma formats Hadamard block size is increased.
1 1 1 1 2 1 –1 –2 1 –1 –1 1 1 –2 2 –1 VC Algorithm : Transform • Block diagram emphasizing transform Bitstream Output Video Input + Entropy Coding Transform & Quantization - Inverse Quantization & Inverse Transform • 4 x 4 integer DCT transform • H = • - Hadamard transform of DC coefficients • for 16 x 16 Intra luma and 8 x 8 chroma blocks + + Intra/Inter Mode Decision Intra Prediction Motion Compensation Deblocking Filtering Picture Buffering Motion Estimation
VC Algorithm : Quantization • Multiplication operation for the exact transform is combined with the multiplication of scalar quantization • Encoder : post-scaling and quantization • Decoder : inverse quantization and pre-scaling X : quantizer input Y : quantizer output Qstep : quantization parameter, a total of 52 values, doubles in size for every increment of 6 in QP 8 for bits per decoded sample. FRExt expands QP beyond 52 by 6 for each additional bit of decoded sample SF : scaling term
Input block Output block Encoder output / decoder input Post-scaling and quantization Forward transform Inverse quantization and pre-scaling Inverse transform 2x2 or 4x4 DC inverse transform 2x2 or 4x4 DC transform Chroma or Intra- 16 Luma Only Chroma or Intra- 16 Luma Only Decoder part Encoder part VC Algorithm : Transform, Quantization Rescale and Inverse transform Intra (16x16) prediction mode only
VC Algorithm : Entropy Coding • All syntax elements other than residual transform coefficients are encoded by the Exp-Golomb codes (UVLC) • Scan order to read the residual data (quantized transform coefficients) : zig-zag, alternate • Context-based Adaptive Variable Length Coding (CAVLC) in All Profiles • Context-based Adaptive Binary Arithmetic Coding (CABAC) in Main Profile Alternate scan Zig-zag scan
Exponential Golomb codes (for data elements other than tansform coefficients – these codes are actually fixed, and are also called Universal Variable Length Codes (UVLC))