570 likes | 665 Views
Introduction to H.26L (TML-8). ITU - Telecommunications Standardization Sector STUDY GROUP 16 Video Coding Experts Group (VCEG) http://kbs.cs.tu-berlin.de/~stewe/vceg/archive.htm#TML8 卓傳育. Progressionism on H.26x. H.261 ISDN px64kbps
E N D
Introduction to H.26L (TML-8) ITU - Telecommunications Standardization Sector STUDY GROUP 16 Video Coding Experts Group (VCEG) http://kbs.cs.tu-berlin.de/~stewe/vceg/archive.htm#TML8 卓傳育
Progressionism on H.26x • H.261 ISDN px64kbps • H.263 PSTN Very low bit-rate video < 64kbps, four optional mode • H.263 v2 (H.263+), #11 extension of H.263, 12 optional mode • H.263 ++ v3 backward compatible to H.263+ • H.26L under develop, not necessary backward compatible to H.263+
H.26L (Long-Term) • 1997/12 Started • 1999/8 TML 1 (Berlin) • 1999/10 TML 2 (Red Bank) • 2000/2 TML 3 (Geneva) • 2000/5 TML 4 (Osaka) • 2000/8 TML 5 (Portland), 5.9, 5.91 • 2001/1 TML 6 (Eibsee) • 2001/7 TML 8 (post-Austin)
The H.26L function set • High compression performance: • 50 % greater bit-rate savings from H.263 at all bit-rates. • Simplifications “back to basics” approach: • simple and straightforward design using well-known building blocks. • Flexible application to delay constraints appropriate to a variety of services: • low delay. • Error resilience. • Complexity scalability in encoder and decoder: • scalability between image quality and amount of encoder processing. • Full specification of decoding (no mismatch). • High quality application: • good quality also in high bit-rates. • Network friendliness.
TML-8 changes from TML-6 • Run coding of coded MBs M29,M57 • 1/8 pixel prediction accuracy M45 • DQUANT on MB level M31 • Vectors pointing outside picture M34 • Dropping RDquant M72 • Drop of isolated chroma AC coefficients M32 • New sections7,8 for data partitioning and NAL(input not received yet) M52 • Some bug fixes • New section for entropy coding including the description of UVLC and CABAC. (D. Marpe) • Section “Motion Estimation and Mode Decision” with new High-complexity mode (H. Schwarz)
Some of the differences from H.263 • Only one regular VLC is used for symbol coding • 1/4 pixel positions are used for motion prediction • A number of different blocksizes are used for motion prediction • Residual coding is based on 4x4 blocks and a integer transform is used • Multiple reference frames may be used for prediction and this is considered to replace any use of B-frames
Typical Video Coder Uniform Quantizer 32 nonlinear increased Q-Step-size x 1.12 4x4 Integer DCT (fixed) Single Universal VLC Loop-Filter No Mismatch Variable block-sizes Five Prediction modes with B-picture Intra Prediction Modes 6 4x4 & 4 16x16 modes = 10 modes
QCIF Image 9 11 Subdivision of a picture into macroblocks
Transform and inverse transform • 4x4 block size • Instead of DCT, an integer transform with basically the same coding property as a 4x4 DCT is used. Transform Inverse Transform A = 13a + 13b + 13c + 13d B = 17a + 7b - 7c - 17d C = 13a - 13b – 13c + 13d D = 7a - 17b + 17c - 7d a' = 13A + 17B + 13C + 7D b' = 13A + 7B - 13C – 17D c' = 13A – 7B – 13C + 17D d' = 13A – 17B + 13C - 7D
2x2 transform/inverse transform of chrome DC coefficients DC0 DC1 Two dimensional 2x2 transform DDC(0,0) DDC(1,0) DC2 DC3 DDC(0,1) DDC(1,1) Definition of transform: DCC(0,0) = (DC0+DC1+DC2+DC3)/2 DCC(1,0) = (DC0-DC1+DC2-DC3)/2 DCC(0,1) = (DC0+DC1-DC2-DC3)/2 DCC(1,1) = (DC0-DC1-DC2+DC3)/2 Definition of inverse transform: DC0 = (DCC(0,0)+ DCC(1,0)+ DCC(0,1)+ DCC(1,1))/2 DC1 = (DCC(0,0)- DCC(1,0)+ DCC(0,1)- DCC(1,1))/2 DC2 = (DCC(0,0)+ DCC(1,0)- DCC(0,1)- DCC(1,1))/2 DC3 = (DCC(0,0)- DCC(1,0)- DCC(0,1)+ DCC(1,1))/2
I A B C D E a b c d F e f g h G i j k l H m n o p 2 1 3 4 5 Intra prediction mode (Intra_pred_mode) • Intra 4x4 • Imode, nc, AC
I A B C D E a b c d F e f g h G i j k l H m n o p Intra prediction mode (Intra_pred_mode) • Mode 0: DC prediction • Mode 1: • Mode 2: Vertical prediction • Mode 3: Diagonal prediction • Mode 4: Horizontal prediction • Mode 5:
I A B C D E a b c d F e f g h G i j k l H m n o p Intra prediction mode (Intra_pred_mode) • Mode 0: DC prediction (default) • All pixels are predicted by (A+B+C+D+E+F+G+H)//8 • If four of the pixels are outside the picture, the average of the remaining four is used for prediction. • If all 8 pixels are outside the picture the prediction for all pixels in the block is 128. • Always used for Chroma blocks • Mode 2: Vertical prediction (If A,B,C,D are inside the picture ) • a,e,i,m are predicted by A, b,f,j,n by B etc. • Mode 4:Horizontal prediction (If E,F,G,H are inside the picture ) • a,b,c,d are predicted by E, e,f,g,h by F etc.
I A B C D E a b c d F e f g h G i j k l H m n o p Intra prediction mode (Intra_pred_mode) Be used only if all A,B,C,D,E,F,G,H,I are inside the picture. • Mode 3: Diagonal prediction • m is predicted by (H+2G+F)//4 • i,n are predicted by (G+2F+E)//4 • e,j,o are predicted by (F+2E+I)//4 • a,f,k,p are predicted by (E+2I+A)//4 • b,g,l are predicted by (I+2A+B)//4 • c,h are predicted by (A+2B+C)//4 • d is predicted by (B+2C+D)//4
I A B C D E a b c d F e f g h G i j k l H m n o p Intra prediction mode (Intra_pred_mode) To be used only if all A,B,C,D are inside the picture. • Mode 1 • a is predicted by (A+B)/2 • e is predicted by B • b,i are predicted by (B+C)/2 • f,m are predicted by C • c,j are predicted by (C+D)/2 • d,g,h,k,l,n,o,p are predicted by D • Mode 5 • a is predicted by (E+F)/2 • b is predicted by F • c,e are predicted by (F+G)/2 • f,d are predicted by G • i,g are predicted by (G+H)/2 • h,j,k,l,m,n,o,p are predicted by H To be used only if all E,F,G,H are inside the picture.
0 0 2 2 1 1 3 3 4 4 6 6 5 5 7 7 A B C Coding of Intra prediction modes B\A outside 0 1 2 3 4 outside 0 - - - - 0 1 - - - 1 0 - - - - - - - - - - - - - - - - - - 0 0 2 - - - 0 2 1 3 4 1 0 2 3 4 0 2 1 3 4 3 0 1 2 4 0 1 2 4 3 1 - - - - - 0 1 2 3 4 1 0 2 3 4 0 2 1 3 4 1 0 3 2 4 1 0 2 4 3 2 2 0 - - - 2 0 1 3 4 1 2 0 3 4 2 0 3 1 4 2 3 0 1 4 2 0 4 1 3 3 - - - - - 0 3 2 1 4 1 3 0 2 4 2 0 3 1 4 3 0 2 1 4 0 1 3 4 2 4 - - - - - 0 2 4 3 1 0 1 2 4 3 0 2 3 4 1 0 1 2 3 4 0 2 4 1 3
S0 S1 S2 S3 A B C D Prediction of chroma blocks If S0, S1, S2, S3 are all inside the frame: A = (S0 + S2 + 4)/8 B = (S1 + 2)/4 C = (S3 + 2)/4 D = (S1 + S3 + 4)/8 If only S0 and S1 are inside the frame: A = (S0 + 2)/4 B = (S1 + 2)/4 C = (S0 + 2)/4 D = (S1 + 2)/4 If only S2 and S3 are inside the frame: A = (S2 + 2)/4 B = (S2 + 2)/4 C = (S3 + 2)/4 D = (S3 + 2)/4 If S0, S1, S2, S3 are all outside the frame: A = B = C = D = 128
Intra mode based on 16x16 macroblocks (16x16 intra mode) • Particularly suitable for regions with little details, also referred to as ‘flat’ regions. • Prediction modes • IMODE = 0 (vertical) • Pred(i,j) = P(i,-1), i,j=0..15 • IMODE = 1 (horizontal) • Pred(i,j) = P(-1,j), i,j=0..15 • IMODE = 2 (DC prediction) • Pred(i,j) = i,j=0..15 • IMODE = 3 (Plane prediction) • Pred(i,j) = (a + bx(i-7) + cx(j-7) +16)/32 Where: a = 16x(P(-1,15) + P(15,-1)) b = 5x(H/4)/16 c = 5x(V/4)/16
Residual coding • Based on 4x4 transform. • Only single scan is used for 16x16 intra coding. • Normalization factor a’=676a • To avoid the division we performed normalization by 49/215 on the encoder side and 48/215 on the decoder side.
0 1 2 3 Luma residual coding 4x4 block order Chroma residual coding 4x4 block order U V 0 1 4 5 16 17 2x2 DC 2 3 6 7 8 9 12 13 18 19 22 23 AC 10 11 14 15 20 21 24 25 Ordering of blocks for CBPY and residual coding of 4x4 blocks CBPY 8x8 block order
Signalling of mode information for 16x16 intra coding • Three parameters have to be signaled. They are all included in MB-type. • Imode: 0,1,2,3 • AC: • 0 means there are no ac coefficients in the 16x16 block. • 1 means that there is at least one ac coefficient and all 16 blocks are scanned. • nc: CBP for chroma
Reference frame (Ref_frame) Code_number Reference frame 0 The last decoded frame (1 frame back) 1 2 frames back 2 3 frames back .. ..
Mode 1 Mode 2 Mode 3 Mode 4 0 0 1 0 0 1 1 2 3 Mode 5 Mode 6 Mode 7 0 1 2 3 0 1 0 1 2 3 2 3 4 5 6 7 4 5 6 7 4 5 8 9 10 11 6 7 12 13 14 15 Numbering of the vectors for the different blocks depending on the inter mode
1:4 1:2 1:1 Fractional pixel accuracy Interpolation: 6H, 6V bilinear Step I: Generation of ½ pixel positions6 tap filter: (1,-5,20,20,-5,1)/32 x + x + x + x + x * * * * * * * * * x + x + x + x + x * * * * * * * * * x + x + x + x + x * * * * * * * * * Step II: Generation of ¼ pixel positionslinear interpolation x + x + x + x + x * * * * * * * * * x + x + x + x + x * * * * * * * * * x + x + x + x + x * * * * * * * * *
Fractional pixel accuracy Interpolation position with more low pass filtering A a 1 b B c d e f g 2 h 3 i 4 j k l m n C o 5 p D m (‘3’ + ‘4’ + ‘5’ + ‘D’)/4. ( is used due to possible rounding effects). Instead we define: m = (‘A’ + ‘B’ + ‘C’ + ‘D’ + 2)/4
1/8 pixel accuracy • For a higher complexity or higher coding efficiency profile (not yet defined) Position Integer 1 1/8 (-3 12 -37 485 71 -21 6 -1)/512 2/8 (-3 12 -37 229 71 -21 6 -1)/256 3/8 (-6 24 -76 387 229 -60 18 -4)/512 4/8 (-3 12 -39 158 158 -39 12 -3)/256 5/8 (-4 18 -60 229 387 -76 24 -6)/512 6/8 (-1 6 -21 71 229 -37 12 -3)/256 7/8 (-1 6 -21 71 485 -37 12 -3)/512
D B C A E Prediction of vector components • The prediction is normally formed as the median of A, B and C. • If A and D are outside the picture, their values are assumed to be zero. • If D, B, C are outside the picture, the prediction is equal to A. • If C is outside the picture or still not available due to the order of vector data (see Figure 2), C is replaced by D.
D B C A E 8x16 16x8 8x4 4x8 Directional segmentation prediction • Vector block size 8x16: • Left block: A is used as prediction • Right block: C is used as prediction • Vector block size 16x8: • Upper block: B is used as prediction • Lower block: A is used as prediction • Vector block size 8x4: • For white blocks: "Median prediction" is used • For shaded blocks: A is used as prediction • Vector block size 4x8: • For white blocks: "Median prediction" is used • For shaded blocks: B is used as prediction
Chroma vectors • Chroma has half resolution compared to luma • Croma_vector = Luma_vector/2 with truncation. Which means that the chroma vectors have a resolution of 1/8 pixel. • For fractional pixel interpolation for chroma prediction, bilinear interpolation is used.
Coded Block Pattern (CBP) • The CBP contains information of which 8x8 blocks - luma and chroma - contain transform coefficients. • For chroma we define 3 possibilities: • nc=0: no chroma coefficients at all. • nc=1 There are nonzero 2x2 transform coefficients. All chroma AC coefficients = 0. Therefore we do not send any EOB for chroma AC coefficients. • nc=2 There may be 2x2 nonzero coefficients and there is at least one nonzero chroma AC coefficient present. In this case we need to send 10 EOBs (2 for DC coefficients and 2x4=8 for the 8 4x4 blocks) for chroma in a macroblock. • The total CBP for a macroblock is • CBP = CBPY + 16xnc
Dquant • Dquant contains the possibility of changing QUANT on the macroblock level. Dquant is present for non-skipped macroblocks: - If CBP indicates that there are nonzero transform coefficients in the MB or - If the MB is 16x16 based intra coded • The value of Dquant shall be interpreted in the same way as Motion Vecor Data (). Its value may range from -16 to +16 which enables the QP to be changed to any value in the range 0-31. • QUANTnew = modulo32(QUANTold + Dquant + 32) (also known as "arithmetic wrap")
0 1 5 6 2 4 7 12 3 8 11 13 9 10 14 15 0 1 2 5 0 2 3 6 1 3 4 7 4 5 6 7 Scanning and quantization • Simple scan • Double scan
Quantization • 32 different QP values are used. • An increase of step size of about 12% from one QP to the next. • No dead zone. • Increase of QP by 6 means that the step size is about doubled. QPluma 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 QPchroma 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 17 18 19 20 20 21 22 22 23 23 24 24 25 25
Summary Uniform Quantizer 32 nonlinear increased Q-Step-size x 1.12 4x4 Integer DCT (fixed) Single Universal VLC Loop-Filter No Mismatch Variable block-sizes Five Prediction modes with B-picture Intra Prediction Modes 6 4x4 & 4 16x16 modes = 10 modes
Entropy Coding • Universal Variable Length Coding (UVLC) • Context-based Adaptive Binary Arithmetic Coding (CABAC)
Universal Variable Length Coding (UVLC) • Exponential Golomb Codes • Code words are in the • following form Code number Codewords in explicit form 0 1 1 0 0 1 2 0 1 1 3 0 0 0 0 1 4 0 0 0 1 1 5 0 1 0 0 1 6 0 1 0 1 1 7 0 0 0 0 0 0 1 8 0 0 0 0 0 1 1 9 0 0 0 1 0 0 1 10 0 0 0 1 0 1 1 11 0 1 0 0 0 0 1 ...... . . . . . . . A codeword with its length in bits (L) and INFO = xn .. x1 x0 . 1 0 x0 1 0 x1 0 x0 1 0 x2 0 x1 0 x0 1 0 x3 0 x2 0 x1 0 x0 1 .....………… • It is used to code all syntax elements. (MB_Type, Intra_pred_mode…etc.)
Context-based Adaptive Binary Arithmetic Coding (CABAC) • Context modeling provides estimates of conditional probabilities of the coding symbols. • Arithmetic codes permit non-integer number of bits to be assigned to each symbol of the alphabet. • Adaptive arithmetic codes permit the entropy coder to adapt itself to non-stationary symbol statistics.
B A C Context Models for Macroblock Type • Intra Pictures • Intra4x4 and Intra16x16 • ctx_mb_type_intra(C) = A + 2*B • P- and B-Pictures • 10 different macroblock types for P-frames and 18 different macroblock types for B-frames
Other Context Models • Context Models for Motion Vector Data • Context Models for Reference Frame Parameter • Context Models for Coded Block Pattern • Context Models for Intra Prediction Mode • Context Models for Run/Level
Test model issues • Motion Estimation and Mode Decision • Quantization • Elimination of single coefficients in inter macroblocks
SA(T)D0 Prediction Block_difference Hadamard transform SA(T)D SA(T)Dmin Motion Estimation and Mode Decision • Low-complexity mode • Finding optimum prediction mode Intra mode decision: SA(T)D0 = QP0(QP)xOrder_of_prediction_mode (see above) Motion vector search: SA(T)D0 = QP0(QP)x(Bits_to_code_vector + 2xcode_number_of_ref_frame) for selecting intra modes and for fractional pixel search A 4 point Hadamard transform Pixels B 1 1 1 1 V 1 1 -1 -1 1 -1 -1 1 1 -1 1 -1
Motion Estimation and Mode Decision • Low-complexity mode • Encoding on macroblock level • Table for intra prediction modes to be used at the encoder side (intra coding) • Inter mode selection • 35 combinations of blocksizes and reference frames. (7x5 ) • Integer pixel search • Fractional pixel search • Decision between intra and inter . . . . . . .15 9 11 13 16 .17 3 1 4 18 .19 5 0 6 20 .21 7 2 8 22 .23 10 12 14 24 A B C 1 2 3 D 4 E 5 F a b c 6 d 7 e 8 f g h G H I
Motion Estimation • High-complexity mode • For each block or macroblock the motion vector is determined by full search on integer-pixel positions followed by sub-pixel refinement. • Integer-pixel search • MC_range is used for all INTER-modes and reference frames. • The prediction vector of the 16x16 block is used as center of the spiral search for all INTER-modes. • The search range is not forced to contain the (0,0)-vector.
Motion Estimation • Fractional pixel search • Finding the best motion vector • Finding the best reference frame
Mode decision I-frame: P-frame: B-frame: For I, P For B
INTER 16x16 mode decision • INTER 4x4 mode decision