740 likes | 1.12k Views
Introduction on MPEG Video Coding Standards. Yung-Ching Chang ( 張永清) Visual Communication Laboratory, CS, NTHU. Lossy Coding of Still Image - JPEG. Uncompressed Bitrate for Video. Motion Compensated Predictive Coding. Video Compression Standards. CCITT H.261 ITU-T Study Group 15
E N D
Introduction on MPEG Video Coding Standards Yung-Ching Chang (張永清) Visual Communication Laboratory,CS, NTHU
Video Compression Standards • CCITT H.261 • ITU-T Study Group 15 • Videophone and video conferencing • 1988-1990: p x 64 kbps (p = 1… 30) • ITU-T H.263 • PSTN and mobil network: 10 to 24 kbps • 1994: H.263, H.263+…
Video Compression Standards (cont’d) • MPEG-1 Video (ISO/IEC 11172-2) • 1.2 ~ 1.5Mbps • Video for digital storage media, CD-ROM • Sep 1990 • MPEG-2 Video (ISO/IEC 13818-2) • 2 ~ 30 Mbps • Digital broadcast TV, HDTV, Video services on network • Nov 1993 • MPEG-4 (ISO/IEC 14496) • An emerging coding standard • Universal access
MPEG-1 v.s. H.261 (Conceptually) • H.261 • Short algorithm delay • Lower compression complexity • Lower memory requirement • Limited flexibility on bit rate control • MPEG-1 • Longer algorithm delay • Higher compression complexity • Higher memory requirement • More coding mode support higher bit rate flexibility
Algorithm Delay • H.261 • MPEG-1 B-picture can’t be coded until next P- or I-frame
Compression Complexity • H.261 • MPEG-1
Memory Requirement • H.261 • MPEG-1
Bit Rate Flexibility • H.261 • MPEG-1 • GOP structure and B-frame can offer more flexibility on coding bit rate
MPEG-1 v.s. H.261 (Technically) • MPEG-1 • Bi-directional motion compensation (B-picture) • Group of pictures (GOP) • Half-pel motion compensation • Visually weighted quantization • No picture size or bit rate constraints • Flexible slice structure instead of GOB
MPEG-1 Coding Hierarchy . . . . . Video sequence Group of pictures (GOP) Divided into GOPs . . . . . I B B P B B P B B P B B I B B P … Motion estimation
MPEG-1 Coding Hierarchy (cont’d) Slice1 Slice2 Slice3 16 16 Slice4 Slice5 Slice6 Picture Slices Slice7 Slice8 Slice9 Slice10 Slice11 Slice12 Slice13 8 8 Slice Macroblocks Y Cb Cr
Some Coding Schemes • GOP • Random access • Prevent error propagation • B-picture • Pros: Best prediction and compression, object occlusion and entrance into scene, noise averaging. • Cons: Encoder delay, high complexity, large encoder buffer required • Slice • Synchronous unit • Suit for localized image property
Group of Pictures • Group of pictures (GOP) • A GOP contains at least one I-picture • Must start by I-picture in bitstream order • Can have any number of P-picture and B-picture Display order: 1I 2B 3B 4P 5B 6B 7P 8B 9B 10I 11B 12B 13P 14B 15B 16P 17B 18B 19I … Bitstream order: 1I 4P 2B 3B 7P 5B 6B 10I 8B 9B 13P 11B 12B 16P 14B 15B 19I 17B 18B … Display order: 1I 2B 3B 4P 5B 6B 7P 8B 9B 10I 11B 12B 13P 14B 15B 16P 17B 18B 19I …
Group of Pictures (cont’d) • Closed GOP • Don’t reference to the pictures in the previous GOP • Can be easily removed while editing • Open GOP: Reference to previous GOP Display order Closed GOP: I B B P B B P B B P Open GOP: B B I B B P B B P B B P Reference to the previous P or I Closed GOP: B B I B B P B B P B B P Only reference to the next I
System Stream Layer • An MPEG stream is segmented into packs • Contain info about system clock, bit rate, number of video streams and audio streams. • Multiplexing of video streams and audio streams • Can contain multiple packets, ex. three packets for video stream 1 and video stream 2 and audio stream 1. • Packet • Each packet contain a segment of data from a video stream or audio stream • Has presentation time and/or decoding time • Combine the payload of contiguous packets to form a elementary stream
Coding MPEG Video • Rate control within a sequence • Allocate bit rate for each picture • A reasonable ratio, I:P:B = 8:5:1 • Give the I and P the same visual quality, and reduce the bit rate for B to save bits, because B is not referenced, lower quality will not propagate • If there is little motion or change, the I should get more bits; if there is a lot of motion or change, reduce the bits of I and give them to P • Video stream of VCD: 1394.4 kbps, contain 30 pictures, typical GOP is IBBPBBPBBPBBPBB or IBBBPBBBPBBBPBBBP
Rate Control within a Picture • Allocate the target bits for each macroblock • If the generated bits over the target bits • Increase the quantizer scale • Discard the high frequency of DCT coefficients • If the generated bits is lower than the target bits • Decrease the quantizer scale • Insert the macroblock stuffing bits • How to allocate bits? • Smaller quantizer scale for smooth area to avoid blocking effect • Higher quantizer scale for rough area to save bits
Slice selection • Each slice header require 40 bits • For a video (30 picture/s) with vertical resolution is 240, there are 15 slices if each row of macroblocks is a slice. • If a picture contains only one slice 1200bps for the slices • If a picture contains 15 slices 18000bps for the slices • A slice is the minimum independently decodable unit • For an error free environment, one slice per picture may be appropriate • If the environment is noisy, the one slice per row of macroblocks may be more desirable • A slice have a quantizer scale, ranged from 1 to 31
Motion Estimation • The estimation distance is more longer than the H.261 • 1024 for full pixel or 512 for half pixel • Full search is not suitable and require a faster search algorithm I B B P
Coding I-Pictures • Macroblock types in I-picture • intra-d: encode in intra-mode with default quantization • intra-q: encode in intra-mode with updated quantization • Each intra-q require extra 5 bits for quantizer scale, ranged from 1 to 31 • A macroblock divided into for luminance blocks and two chrominance blocks, all six blocks have to be DCT coded
Index Coef. Coding blocks in I-Pictures • Applying DCT to each blocks as defined in H.261 • Quantize coefficients by the uniform quantizer for I-pictures • The final quantizer scale for DC is always 8 • The final quantizer scale for each AC is the the corresponding value in the quantization matrix multiple the quantizer scale of this macroblock
Coding blocks in I-Pictures (cont’d) • The quantized DC is DPCM + entropy coded • The quantized ACs are zig-zag scanned and then entropy coded • Example:
Coding P-Pictures • Seven macroblock types in P-pictures • -m: motion compensation, require motion vector • -c: coding pattern to indicate which blocks to be DCT coded • -q: change quantizer scale • skipped: use motion vector of previous macroblock
Coding P-Pictures (cont’d) • Coded block pattern (CBP) • Indicate which blocks to be DCT coded • If all quantized coefficients in one block are zero, this block is not coded; if all blocks are not coded, skip this macroblock • Selection of macroblock type CBP = 32 * BY0 + 16 * BY1 + 8 * BY2 + 4 * BY3 + 2 * BCb + BCr Quant Pred-mcqPred-mcPred-mPred-cqPred-cSkippedIntra-qIntra-d Coded Not quant MC Not coded Quant Begin Coded Not quant Non-Intra Not coded No MC Quant Intra Not quant
Index Coef. Coding blocks in P-Pictures • Intra blocks are coded as I-picture • Inter blocks • The residual is applying DCT • Quantize coefficients by the dead zone quantizer • The final quantizer scale for each AC is the the corresponding value in the quantization matrix multiple the quantizer scale of this macroblock
Coding B-Pictures • Eleven macroblock types in B-pictures • -I: interpolation, -c: coding pattern, -f: forward, -b: backward, -q: quantization
Coding B-Pictures (cont’d) • Selection of macroblock type • Because B-pictures have lowest bit rate, try to select the skipped type at first • Do the forward motion estimation and backward estimation, and then do interpolation find the best one AAA Begin Quant Pred-*cqPred-*cPred-* or skippedIntra-qIntra-d Coded Not quant Non-Intra Not coded A Quant Intra Not quant
Decoding a Sequence for VCR Command • Decoding for fast forward • Discard the B-pictures and decode only the I- and P- • Discard the P- and B-pictures and decode only the I- • Decoding for reverse play • Require a large buffer to store whole bitstream of a GOP, and then decode and display at a reverse order B B I B B P B B P B B P pictures in display order0 1 2 3 4 5 6 7 8 9 10 11I B B P B B P B B P B B pictures in decoding order2 0 1 5 3 4 8 6 7 11 9 10I P P P B B B B B B B B pictures in new order2 5 8 11 10 9 7 6 4 3 1 0
Pre- and Post-Processing • Pre-processing • Apply medium filter to remove noise • Apply low-pass filter to smoothing the image edge, remove the high frequency to prevent the ringing effect • Post-processing • Blocking artifacts are more visiblein the low frequency blocks • Low-pass filter at block boundaries • Wide low-pass filter at adjacent smooth blocks
Pre- and Post-Processing (cont’d) • Ringing artifact appears along thesharp edges, in other words, in thehigh frequency blocks • Detect the edges in ringing block bythe Sobel masks, mark as edge if overa threshold • Apply a simple low-pass filter on thenon-edge area
MPEG-2 Compared to MPEG-1 • Frame/Field adaptive motion compensation and DCT • Dual prime motion compensation (for P-pictures when no B-pictures) • Nonlinear quantization table with increased accuracy for small values • Alternate scan for DCT coefficients • New VLC tables for DCT coefficients coding • In addition to 4:2:0, also supports 4:2:2 and 4:4:4 • Support maximum motion vector range of -2048 to +2047.5 (always half-pixel motion vectors)
Frame/field DCT • Frame DCT • Field DCT
Additional Chrominance Format • 4:2:0 • 4:2:2 • 4:4:4 Y Cb Cr
MPEG-4 Components • Face • 66 Facial animation parameters • Primary facial expressions • 14 Visemes • VO (Video Object) • Shape • Motion vectors • Texture • Texture • From VOP • Still texture (Discrete Wavelet Transform) • AO (Audio Object) • MPEG Layer 1-3 • AAC(Advanced Audio Coder) • TTS (Text-To-Speech) • 2D Mesh • Triangular patches • Motion vector
Content-based Audio-Visual Representation • Audio-Visual Object (AVO) • Video object component (video object plane, VOP) • natural or synthetic • 2D or 3D • Audio object component • mono, stereo or multichannel
Video Object Planes (VOP) • Characteristics of VOP • may have different spatial temporal resolutions • may be associated with different degrees of accessibility sub-VOPs • may be separated or overlapping • VOP type • Traditional I, P, B type • S-VOP (Sprite) for background
Video Object Plane Type S-VOP Time S-VOP B-VOP B-VOP B-VOP B-VOP B-VOP B-VOP I-VOP P-VOP P-VOP
Content-based Object Manipulation • Object manipulation • change of the spatial position of a VOP • application of a spatial scaling factor to a VOP • change of the speed with which an VOP moves • insertion of new VOPs • deletion of an object in the scene • change of the scene area
Segmentation Process • Depending on applications, segmentation can be perform • Online (real-time) or offline (non-real-time) • Automatic or semi-automatic • Examples • Video conferencing • real-time, automatic • separate foreground (communication partner) from background • Object Tracking in Video • May allow off-line and semi-automatic • separate moving object from others
Compression • Improved coding efficiency • 5-64 kbps for mobile applications • up to 20Mbps for TV/film applications • subjectively better quality compared to existing standard • Coding of multiple concurrent data streams • can code multiple views of a scene efficiently,e.g. stereo video
Coding VO in MPEG-4 • Reduce temporal redundancy • Motion estimation for arbitrary shaped VOPs • padding and modified block (polygon) matching motion estimation P-VOP B-VOP time I-VOP
Coding Procedure of VOP • BAB (Binary Alpha Block) • Motion Vector • CAE (Context-Based Arithmetic Encoding) • Rate Control by Sub-sampling • Texture • Motion Vector • DCT • Rate Control by Quantization Step
New Coding Features • For each macroblock, the motion vectors can be computed on a 16 16 or 8 8 block basis • Unrestricted motion estimation: prediction can extend over image boundary • Overlapped block motion compensation • Each component of texture can range from 1 to 12 bits • More robust coding