Video Compression

Video Compression

Video Compression Standards • JPEG: ISO and ITU-T • for compression of still image • Moving JPEG (MJPEG) • H.261: ITU-T SG XV • for audiovisual service at p x 64Kbps • MPEG-1, 2, 4, 7: ISO IEC/JTC1/SC29/WG11 • for compression of combined video and audio • H.263: ITU-T SG XV • for videophone at a bit-rate below 64Kbps • JBIG: ISO • for compression of bilevel images • Non-standardized techniques • DVI: de facto standard from Intel for storage compression and real-time decompression • QuickTime: Macintosh

I frame: Intra-coded frame points for random access used as a reference for coding other frames Use JPEG except quantization threshhold values are same for all DCT components P frame: Predictively coded frame based on the reference frame (previous I or P frame) B frame: Bidirectionally predictively coded frame based on the previous and following I and/or P frames D frame: DC coded frame intra-coded frame, neglecting AC coefficients used for fast forward and rewind mode Frame/Picture Types

Group of Picture (GOB) Structure

Display and Transmission Order • Transmission order and display order may differ • Reference frames must be transmitted first Forward prediction 1 2 3 4 5 6 7 8 9 I B B B P B B B I Bidirectional prediction Transmission Order : 1 5 2 3 4 9 6 7 8 I P B B B I B B B

Macroblock: motion compensation unit Motion Estimation extracts the motion information from a video sequence Motion information one motion vector for forward-predicted macroblock two motion vectors for bidirectionally predicted macroblocks Motion Compensation reconstructs an image using blocks from the previous image along with motion information, I.e., motion vectors Motion Estimation and Compensation

Implementation Issues • In case of P-frames, encoding of each macrobock is dependent on the output of motion estimation unit • If two contents are the same, only the address of the MB in the reference frame is encoded • If very close, both the motion vector and the difference matices are encoded • If no close match is found, encode in the same way as in I-frame

Implementation Schematics Bitsteram format

Performance • I-frame • Similar to JPEC • 10:1 – 20:1 • P-frame • 20:1 – 20:1 • B-frame • 30:1 – 50:1

Video CompressionH.261

H.261 Overview • ITU-T standard for the compression/ decompression of digital video (1990) • to facilitate video conferencing and video phone over ISDN at the rate of p x 64 kbps; p = 1,2, ... ,30 • real-time encoding-decoding ( 150ms) • low-cost VLSI implementation

Picture preparation • An image: 3 rectangular matrices (components) • Luminance Y • Chrominance Cb (blue), Cr (red) • 4:1:1 format • Image format • CIF(common intermediate format) : 352x288 • Used for video conferencing • 30fps, progressive scanning • QCIF(Quarter CIF) : 176x144 • Used for video telephony • 15 / 7.5fps, progressive scanning • QCIF is mandatory. CIF is optional • Bandwidth requirement of CIF with 15 fps • Y = 352 x 288 x 8bits/pixel x 15frame/sec • Cb + Cr = 2 x ¼ x Y • 18.3 Mbps  need more than 50:1 compression for transmitting at 384 Kbps (p=6) • I, P-frames are used in H.261 • 3 P-frames between each pair of I-frame

H.261 Encoding Format Frame format GOB structure Macro block format

H.261 Video Encoder

Entropy Encoding • Run-length encoding • (run, amplitude) • Huffman encoding • Huffman table is predefined by the H.261 standard • table for motion vectors • table for quantized DCT coefficient

Video CompressionH.263

H.263 • Low-bit rate standard for teleconferencing applications • Optimize H.261 so as to operate on below 64Kbps or V.34 Modem • 2.5 times more compressed than H.261 • An extension of H.261 • 2 image formats  5 image formats • Motion-compensated prediction has been refined • supports B frame( has only P frame as a reference) • Used in IETF RTSP(Real Time Streaming Protocol) • Used in RealPlayer G2

Picture Preparation • Digitization format • QCIF(Quarter CIF) : 176x144 • Used for video telephony • 15 / 7.5fps, progressive scanning • Sub-QCIF (S-QCIF): 128 x 96 • Progressive scanning, 15 / 7.5fps • Frame types • I, P, B frames

Picture Processing • Unrestricted motion vectors • For those pixels of a potential close-match MB that fall outside of the frame boundary, the edge pixels themselves are used instead • The resulting MB produce a close match, then the motion vector, if necessary is allowed to point outside of the frame area

Error resilience • Target network for H.263 is a wireless network or PSTN  relatively high error rate • Error propagation • Due to the resulting errors in the motion estimation vectors and motion compensation information, errors within a GOB may propagate to other regions of the frame • To minimize error propagation • Error tracking • Independent segment decoding • Reference picture selection

Error tracking • Error detection methods • Out-of-range motion vectors • Invalid variable length codewords • Out-of-range DCT coefficients • Excessive number of coefficients within a MB

Each GOB is treated as a separate subvideo which is independent or the other GOBs in the frame Motion estimation and compensation is limited to the boundary pixels of a GOB rather than a frame Independent Segment Decoding Effect of a GOB being corrupted Used with error tracking

Reference Picture Selection NAK mode ACK mode

MPEGVideo Compression

MPEG • MPEG(Moving Picture Experts Group) • ISO/IEC JTC1/SC29/WG11 • standard for synchronized video and audio • consists of System, Video, Audio, … • System: for multiplexing and synch. • MPEG-1 • ISO Recommendation 11172 • Intended for the storage of VHS-quality audio-visual information on CD-ROM at bit rates up to 1.5Mbps • Video resolution: SIF (up to 352 x 288 pixels) • Compressed bandwidth  1.5 Mbps • about 1.1Mbps for video, 128Kbps for audio, remainder for system • Allows random access, fast forward, rewind • MPEG-2 • Intended for the recording and transmission of studio-quality audio and video • MPEG-4 • Initially, concerned with a similar range of applications to those of H.263, at very low bit rate 4.8 – 64 kbps • Later interactive multimedia applications over the Internet and the various types of entertainment networks • MPEG-7 • To describe the structure and features of the content of the (compressed MM information • Used in search engine

MPEG-1

MPEG-1 frames • Spatial resolution: 352 x 288 pixels (SIF) • Progressive scanning with refresh rate of 30Hz (for NTSC) and 25Hz (for PAL) • Standard allows use of • I-frames only • I- and P-frames only • I-, P-, B- frames • No D frames are supported • I-frame is used for random-access functions • Example sequence • IBBPBBPBBI… for PAL • IBBPBBPBBPBBI… for NTSC

Use of B Frame

Overview • Compression algorithm is based on H.261 • MB • Y plane: 16x16, Cb, Cr plane: 8x8 • Differences from H.261 • Time-stamps (temporal references) to enable the decoder to resynchronize more quickly in the event of one or more corrupted or missing MBs • Introduction of B-frames, • Search window in the reference frame is increased • To improve the accuracu of the motion vectors, a finer resolution is used • Typical compression ration • I-frame: 10:1 • P-frame: 20:1 • B-frame: 50:1

MPEG System • MPEG Standard • Video coding • Audio coding • System coding • Timing and Synchronization • Presentation Time Stamps(PTS) • Decoding Time Stamps(DTS) • System Clock Reference(SCR)

MPEG-1 Video Bitstream Structure Composition Format • GOP layer: video coding unit • First picture must start with I frame for edting • Picture layer: primary coding unit • Slice layer: resynchronization unit • Macroblock layer: motion compensation unit • Block layer: DCT unit

MPEG Frame Structure MPEG-1 MPEG-2

Constrained Parameter set • horizontal size <= 720 pels • vertical size <= 576 pels • total number of macroblocks/picture <= 396 • total number of macroblocks/second <= 396*25 = 330*30 • picture rate <= 30 fps • bit rate <= 1.86 Mbps • decoder buffer <=376,832 bits

MPEG Encoding Scheme

MPEG Decoding Scheme

MPEG-2

MPEG-2 Video • jointly developed by ISO/IEC (IS 13818-2) and ITU-T (H.262) • permits data rates up to 100Mbps • supports interlaced video formats • supports HDTV, • can be used for video over satellite, cable, and other broadband channels • backward compatibility with MPEG-1 and H.261

MPEG-1 and MPEG-2

MPEG-2 Profile and Levels

Main Profile at Main Level (MP@ML) • Target application: digital TV broadcasting • Interlaced scanning: 2 fields Field mode Suitable for live sports Frame mode Suitable for studio-based program

HDTV • 3 Standards • ATV (advance television) in North America • DVB (digital video broadcast) in Europe • MUSE (multiple sub-Nyquist sampling encoding) in Japan and rest of Asia • ITU-R HDTV specification • 16/9 aspect ratio • 1920 sample/line, 1152(1080 visible) lines/frame • Interlaced scanning with 4:2:0 format • ATV standard: Grand Alliance standard • ITU-R spec + 1280 x 720, 16/9 aspect ratio • Video compression: MP@HL • Audio compression: Dolby AC-3 • DVB standard • 4/3 aspect ration, 1440 x 1152(1080 visible) • Video compression: SSP@H1140 (spatially-scalable profile) • MUSE standard • 16/9 aspect ratio, 1920 x 1034 • Video compression: similar to MP@HL

MPEG-4

Goal of MPEG-4 (1) • Initial goal was to refine H.261 with a compression ratio 10 times better. But, failed. • Consequently, the focus was shifted to development of standard for • Flexible bitstreams that are scalable for receivers with different capabilities such as resolutions • Extendable configuration for transmitters to download new applications and algorithms into receivers • Content-based interactivity for multimedia data access, manipulations and bitstream editing, and hybrid, natural and synthetic data • Network independence, so that it can be used with any communication network to provide universal accessibility

Goal of MPEG-4 (2) • MPEG-4 standards for • Multimedia content generation • Network interface for multimedia transport • Interactivity for users • Content-based interactivity • Defined by SNHC (Synthetic and Natural Hybrid Coding) group • Coding for a synthetic human face and body • Animation of the face and body • Media integration of text and graphics • Texture coding for view-dependent applications • Static and dynamic mesh coding with texture mapping • Interface for text-to-speech synthesis and synthetic audio

AVO: Audio/Visual Object • Primitive AVOs • 2D fixed background • Picture of a walking and talking lady without the background • Voice associated with that person • Compound AVO • e.g) AVO that contains both the audio and visual components of a talking and walking person • MPEG-4 treats the audiovisual activities and associated operations, including compression, decompression, multiplexing and synchronization of audiovisual activities, as objects – similar to OOP • View as a configuration, communication, and instantiation of classes of objects • VOP (Video Object Plane) • a video object at any given time • Video encoder encodes each VOP separately

Content-based Video Coding

User Interaction • User interaction operations with the decoded scene following the design of the scene’s author: • Changing view/listening point of the scene by navigating through a scene • Dragging objects to different positions • Triggering a sequence of events by clicking on a specific object, including the starting and stopping of a video stream • Selecting the desired language when multiple language tracks are available

Scalability and Accessibility • MPEG-4 video object coding supports spatial and temporal scalability • This allows the receiver to decode only a part of a bitstream and reconstruct images or image sequences • Good for video delivery over multimedia networks due to bandwidth limitation • Good for displaying limited resolution due to receiver’s capability • Universal accessibility to support various communication media • MPEG-4 provides error robustness and resilience for a noisy environment such as mobile networks • Supports audio and video compression algorithms in error-prone environments at low bit-rates ( < 64 Kbps)

Audio Compression • Compressed using one of algorithms, depending on available bit rate of the transmission channel and sound quality required, e.g. • G.723.1 (CELP) for interactive MM applications over Internet • Dolby AC-3, or MPEG Layer 2 for interactive TV applications over entertainment networks

Video Compression