420 likes | 498 Views
Learn about the H.261 video compression standard developed by CCITT for videoconferencing and videotelephony, its encoder and decoder architecture, frame types, motion estimation limitations, and more.
E N D
H.261 Video CompressionOverview • Developed by CCITT (Consultative Committee for International Telephone and Telegraph) in 1988-1990 • Designed for videoconferencing, videotelephone applications over ISDN telephone lines. • Bit-rate is p x 64 Kb/sec, where p ranges from 1 to 30. • Frame types are CCIR 601 CIF (352 x 288) and QCIF (176 x 144) images with 4:2:0 subsampling.
Quantises the High Spatial Frequencies more coarsely than the low spatial frequencies Generic Encoder Architecture Takes difference between current image and prediction of current image (from previous image) Converts image from Spatial Data to Spatial Frequency Data Discrete Cosine Transform Quantizer Zig-Zag Scan & RLC Huffman Encoder Huffman Codes the Quantised, Spatial Frequencies + Image - Inverse Quantiser Zig-Zag scans & RLC the spatial frequencies Generates Prediction Image from Motion Compensated Previous Image (obtained using Motion Vectors and Previous Image Regenerates the next previous image from the Prediction Error Image and the Prediction Image of the Current Frame Encoded Image Stream Multiplexor Inverse DCT + + Combines the Huffman codes of the Motion Vectors and Prediction Errors into one stream Huffman Encoder Motion Compensator Image Store of Previous Frame Motion Estimator Estimates Motion between Current and Previous Images for Macroblocks as a motion Vector Huffman Codes Motion Vectors
Regenerates the next previous image from the Prediction Error Image and the Prediction Image of the Current Frame Generic Decoder Architecture Converts image from Spatial Frequency Data to Spatial Data De-Quantises the Spatial Frequencies Decodes the Huffman Encoded Prediction Errors Huffman Decoder Inverse Zig-Zag Scan & RLC Inverse Quantiser Inverse DCT + Image + Generates Prediction Image from Motion Compensated Previous Image (obtained using Motion Vectors and Previous Image Inverse Zig-Zag scans & RLC the spatial frequencies De- Multiplexor Encoded Image Stream Huffman Decoder Motion Compensator Splits up the incoming Stream into the Huffman codes of the Motion Vectors and Prediction Errors Image Store of Previous Frame Decodes the Huffman Encoded Motion Vectors
H.261 Video Compression: Limitations of Motion Estimation Algorithm • Object or Camera translation - The motion estimation algorithm only detects translation motion of either the object or the camera in x and y directions between successive images in video. This is a gross simplification of what processes may occur to instigate change between succeive images in video. • Object rotation – objects in the scene may rotate whichmakes objects that are viewed by a camera have a different aspect of the image. • Object change of shape – objects in a scene may change shape. An example of this may be clouds or human walking. • Camera rotation and tilt – The camera may rotate or tilt. This cannot be modelled as translation motion and thus will incur prediction error from a predictor that only uses translation. • Object Occlusion – When one object moves in front of an second object the the second object will have part of it occluded. When an object rotates then part of the object will go out of view or come into view due to occlusion. • Camera positive and negative zoom – When a camera zooms, this scales the image up or down and brings out or in other part of a scene. • Ambient lighting conditions – When the ambient lighting conditions change, due to a light being switched on/off or the sun going behind clouds, then the luminance of the image changes and can not be modelled with just translation. • Scene cuts – When there is a scene cut then the image completely changes with no relation between successive images • Object or camera absolute motion – Cameras or objects do not move to the nearest pixel. They might move to a fraction of a pixel.
H.261 Video Compression I Frames & P Frames • Frame Sequence • Two frame types: Intra-frames (I-frames) and Inter-frames (P-frames): • I-frame provides an accessing point, it uses basically JPEG. • P-frames use "pseudo-differences" from previous frame ("predicted"), so frames depend on each other.
H.261 Video CompressionHierarchical block structure • H.261 divides images into a hierarchical block structure of Group of Blocks (GOB), Macroblocks (MB) and Blocks. • e.g. for a CIF image
H.261 Video Compression Intra-frame Coding • Macroblocks are 16 x 16 pixel areas on Y plane of original image. • A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block. • Quantization is by constant value for all DCT coefficients (i.e., no quantization table as in JPEG).
H.261 Video Compression: Inter-frame (P-frame) Coding • Previous image is called reference image, the image to encode target image. • The difference image (not the target image itself) is encoded. • Need to use the decoded image as reference image, not the original. • "Mean Absolute Error" (MAE) or "Mean Squared Error" (MSE) = sum(E*E)/N to decide best block
H.261 Video Compression: Motion Vector Coding • Motion estimation is done on a macroblock basis. • Motion Vector Difference (MVD) are transmitted using Huffman codes. • MVD is the difference between the Motion Vector (MV) of the current macroblock and the MV of the proceeding macroblock. • MV consists of a pair of horizontal x and vertical y components in the two dimensional spatial domain. • example MVDx(n) = MVx(n) - MVx(n-1) MVDx(n) = 15 - (-10) = 25
H.261 Video Compression: Inter-frame (P-frame) Coding • "Control" -- controlling the bit-rate. If the transmission buffer is too full, then bit-rate will be reduced by changing the quantization factors. • "memory" -- used to store the reconstructed image (blocks) for the purpose of motion vector search for the next P-frame.
H.261 Video Compression: Methods for Motion Vector Searches • C(x + k, y + l) -- pixels in the macro block with upper left corner (x, y) in the Target frame. • R(x + i + k, y + j + l) -- pixels in the macro block with upper left corner (x + i, y + j) in the Reference frame. • Cost function is: • Where MAE stands for Mean Absolute Error. • Goal is to find a vector (u, v) such that MAE(u, v) is minimum.
H.261 Video Compression: Methods for Motion Vector Searches • Full Search Method • Sequentially search the whole [-p, p] region --> very slow
H.261 Video Compression: Methods for Motion Vector Searches • Two-Dimensional Logarithmic Search • Similar to binary search. MAE function is initially computed within a window of [-p/2, p/2] at nine locations as shown in the figure. • Repeat until the size of the search region is one pixel wide: • Find one of the nine locations that yields the minimum MAE • Form a new searching region with half of the previous size and centered at the location found in step 1.
H.261 Video Compression: Hierarchical Motion Estimation • Form several low resolution version of the target and reference pictures • Find the best match motion vector in the lowest resolution version. • Modify the motion vector level by level when going up
H.261 Video Compression: Syntax of Hierarchical Motion Estimation • Many macroblocks will be exact matches (or close enough). So send address of each block in image --> Addr • Sometimes no good match can be found, so send INTRA block --> Type • Will want to vary the quantization to fine tune compression, so send quantization value --> Quant • Motion vector --> vector • Some blocks in macroblock will match well, others match poorly. So send bitmask indicating which blocks are present (Coded Block Pattern, or CBP). • Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG.
H.261 Video Compression: Hierarchical Motion Estimation • Need to delineate boundaries between pictures, so send Picture Start Code --> PSC • Need timestamp for picture (used later for audio synchronization), so send Temporal Reference --> TR • Is this a P-frame or an I-frame? Send Picture Type --> PType • Picture is divided into regions of 11 x 3 macroblocks called Groups of Blocks --> GOB • Might want to skip whole groups, so send Group Number (Grp #) • Might want to use one quantization value for whole group, so send Group Quantization Value --> GQuant • Overall, bitstream is designed so we can skip data whenever possible while still unambiguous.
H.263 Video CompressionOverview • H. 263 is a new improved standard for low bit-rate video, adopted in March 1996. As H. 261, it uses the transform coding for intra-frames and predictive coding for inter-frames. • Bit-rate is 8 Kb/sec and above. • Designed for low bit rate video transmission in mobile networks. • Frame types are CCIR 601 16CIF(1408 x 1152), 4CIF(704 x 576), CIF (352 x 288), QCIF (176 x 144) and SQCIF (128 x 96), images with 4:2:0 subsampling. • Advanced Options: • Half-pixel precision in motion compensation • Unrestricted motion vectors • Syntax-based arithmetic coding • Advanced prediction and PB-frames
H.263 Video CompressionHierarchical block structure • H.263 divides images into a hierarchical block structure of Group of Blocks (GOB), Macroblocks (MB) and Blocks. • e.g. for a CIF image
H.263 Video CompressionAdvanced Modes • Normal: Motion vector differences and DCT coefficients are Huffman coded, a Motion vector is coded per each macroblock, motion vectors are restricted to point within image and images are coded using Intra/Inter modes only. • Syntax-based Arithmetic: Difference Motion Vectors and DCT coefficients are encoded using Arithmetic codes. • Advanced Prediction: Four 8x8 Motion vectors per Macroblock are coded instead of one 16x16 Motion vector per Macroblock. • Unrestricted Motion Vectors: Motion vectors are allowed to point outside the image. • Prediction-Bidirectional: Two images are coded as one. Picture N+2 is forward predicted from picture N and picture N+1 is Bidirectionally coded from pictures N and N+2.
H.263 Video CompressionMacroblock Based Motion Vector Prediction • Motion vectors MVx and MVy are computed with one vector per Macroblock. • The predictor vector is the median predictor of the three candidate Motion Vectors (MV1, MV2, MV3). • MVD, the difference between MV and the predictor is Huffman encoded. • After motion compensation, blocks that have prediction errors above a threshold have their prediction errors intra coded using DCT.
H.263 Video CompressionBlock Based Motion Vector Prediction • Four motion vectors are computed per Macroblock. • Motion vectors MVx and MVy are computed per block. • The predictor vector is the weighted sum of the three prediction Motion Vectors (MV1, MV2, MV3). • MVD, the difference between MV and the predictor is Huffman encoded.
H.263 Video CompressionHalf Pixel Prediction by Bilinear Interpolation • Motion estimation is carried out on an integer pixel resolution. Once this has been carried out motion estimation can be extended to 1/2 pixel resolution using bilinear interpolation to interpolate the fractional pixels. • a = A • b = (A + B)/2 • c = (A + C)/2 • d = (A + B + C + D)/4
MPEG-1 Video CompressionOverview • "Moving Picture Coding Experts Group", established in 1988 to create standard for delivery of video and audio. It became s two-Channel Coding Standard (1992). • MPEG-1 Target: VHS quality on a CD-ROM (352 x 288 + CD audio @ 1.5 Mbits/sec) • Standard had three parts: Video, Audio, and System (control interleaving of streams)
MPEG-1 Video CompressionVideo Encoder • MPEG-1 specifies four picture types • I-pictures: blocks are coded using DCT only (based on JPEG & H.261) • P-pictures: blocks are forward motion compensated • B-pictures: blocks are forward and reverse motion compensated • D-pictures: only the DC coefficients of DCT are coded (for fast forward mode)
MPEG-1 Video Compression Motion Estimation Search • Larger gaps between I and P frames allowed, so expand motion vector search range. • To get better encoding, allow motion vectors to be specified to fraction of a pixel (1/2 pixels).
MPEG-1 Video Compression Need for B-frame • In prediction coding many macroblocks need information not in the reference frame. • Solution is to code bidirectionally e.g. B frames
MPEG-1 Video CompressionB-frame • B-frames search for macroblock in past and future frames. • Typical pattern is IBBPBBPBB IBBPBBPBB IBBPBBPBB • Actual pattern is up to encoder, and need not be regular. • Bidirectional Coding provides greater compression
MPEG-1 Video CompressionCoding B-frame • B frame macroblocks can specify two motion vectors (one to past and one to future), indicating result is to be averaged.
MPEG-1 Video CompressionSlices • Added notion of slice for synchronization after loss/corrupt data. Example: picture with 7 slices:
MPEG-1 Video CompressionSyntax • Bitstream syntax must allow random access, forward/backward play, etc.
MPEG-2 Video CompressionOverview • It is a Multi-Channel Coding Standard (1994,1997). It defines the multi-channel extension to MPEG-1 audio (MPEG-2 BC) • It defines an audio coding standard at lower sampling frequencies (16 kHz, 22.05 kHz, 24 kHz) than MPEG-1 (32 kHz, 44.1 kHz, 48 kHz) • It defines a higher quality multi-channel standard than achievable with MPEG-1 extensions (MPEG-2 Advanced Audio Coding, AAC) • Frame sizes as large as 16383 x 16383 • MPEG-3: Originally for HDTV (1920 x 1080), got folded into MPEG-2
MPEG-2 Video CompressionOverview • Unlike MPEG-1 which is a standard for storing and playing video on a single computer at low bit-rates, MPEG-2 is a standard for digital TV. It meets the requirements for HDTV, DVD (Digital Video/Versatile Disc) and existing TV (PAL, SECAM and NTSC).
MPEG-2 Video Compression:Field and Frame Motion Prediction • Support both field prediction and frame prediction. There are four modes. • Field Prediction: Predictions are made independently for each field. Predictions can be made between even fields and between odd fields or between even and odd fields. • Frame Prediction: Predictions are made from one or more previously decoded frames. • (16 x 8) motion compensation: Two motion vectors are used for each MB. The first is applied to the upper 16x8 region the second to the lower 16x8 region. • Dual-Prime Prediction: Predictions are made from two reference fields (previous odd and even) and are averaged to form the final field prediction (next odd or even).
MPEG-2 Video Compression:DCT Coding • After motion compensation, the prediction errors are coded using the DCT. • For frame prediction, the even (black) and odd (white) fields are structured for DCT coding: • For field prediction, (16x8) and Dual-Prime prediction, the even (black) and odd (white) fields are structured for DCT coding:
MPEG-2 Video Compression:Subsampling and Macroblock Organisation • Typically MPEG-2 uses 4:2:0 subsampling • It also allows 4:2:2 • and 4:4:4 chroma subsampling.
MPEG-2 Video Compression:Scaleability • Scalable Coding Extensions: (so the same set of signals works for both HDTV and standard TV) • SNR (quality) Scalability -- similar to JPEG DCT-based Progressive mode, adjusting the quantization steps of the DCT coefficients. • Spatial Scalability -- similar to hierarchical JPEG, multiple spatial resolutions. • Temporal Scalability -- different frame rates.
MPEG-2 Video Compression:Macroblock Quantisation • Non-linear macroblock quantization factor of AC coefficients. • If type 0, then there is a linear relation between quantiser code and step size. Suited for lower resolution video. • If type 1, then there is a non-linear relation between quantiser code and step size. It is finer for small data values and coarser for larger data values. Suited for higher resolution video.