380 likes | 559 Views
Acknowledgement: Many of the materials are adapted from Prof. Jechang Jeong’s excellent presentation on international coding standards. Computer Vision – Coding Standards. Hanyang University Jong-Il Park. Topics to be covered. International coding standards Background and brief history
E N D
Acknowledgement: Many of the materials are adapted from Prof. Jechang Jeong’s excellent presentation on international coding standards. Computer Vision– Coding Standards Hanyang University Jong-Il Park
Topics to be covered • International coding standards • Background and brief history • Key techniques in • JPEG • MPEG-1,2,4 * Only image/video coding techniques will be covered
Multimedia Everywhere • Towards Multimedia : Consumer Computer Electronics Multimedia Tele- Communication Broadcasting
Still Picture Compression Standards • 1980 : ITU-T T.4 : G3 FAX for PSTN Modified Huffman and Modified READ • 1984 : ITU-T T.6 : G4 FAX for ISDN Modified MR • 1992 : JPEG (ISO 10918, ITU-T T.81) : Color Still Pictures used for Color Fax, Electronic Still Camera, Color Printer, Computer Applications etc Lossless/Lossy Modes, Baseline/Extended Modes, Progressive/Sequential Modes DPCM + DCT + Q + RLE + Huffman/Arithmetic Codes Motion JPEG can be used for Moving Pictures. • 1993 : JBIG (ISO 11544, ITU-T T.82) : Bi-level Pictures Improvement on T.4 and T.6 • Recently: JPEG-LS, JBIG2, etc
Moving Picture Compression Standards • 1982 : ITU-R BT.601 : Studio Quality PCM Component Video Common to 525/60 and 625/50 Systems 13.5 MHz Sampling, 8 bit/sample, 4:2:2 Format • 1990 : ITU-T H.261 : Video Phone/Conference Application via ISDN Bitrate = p x 64 kbps, p = 1-30 MC DPCM + DCT + Q + RLE + Huffman Codes Reference Model 1 - 8 • 1992 : MPEG-1 Video : DSM Applications (e.g. Video CD) Bitrate = 1.5 Mbps MC DPCM + DCT + Q + RLE + Huffman Codes GOP Structure for Random Access and Error Recovery (I, P, B Frames) Simulation Model 1 - 3
Moving Picture Compression Standards(Cont.) • 1994 : MPEG-2 Video (ISO 13818-2, ITU-T H.262) : Generic Algorithm for Various Applications (Broadcasting, Communication, Network, DSM etc) 5 Profiles of Functionality (Simple, Main, Spatial Scalable, SNR Scalable, High) 4 Levels of Resolution (Low, Main, High-1440, High) Deals with Interlaced Scan as well as Progressive Scan Field/Frame ME & DCT, Dual Prime ME, Intra VLC, Alternate Scan, Nonuniform Q, etc • 1993 : ITU-R CMTT.721 : 140 Mbps Contribution Quality Video Adaptive DPCM, Componentwise • 1993 : ITU-R CMTT.723 : 34-45 Mbps Contribution Quality Video MC DPCM + DCT + Q + RLE + Huffman Codes
Moving Picture Compression Standards(Cont.) • 1995 : ITU-T H.263 : Videophone via PSTN Bitrate < 64 kbps (V.34 modem = 33.6 kbps, Recent modem = 56 kbps) Improved version of H.261 • 1998 : MPEG-4 Bitrates < 2 Mbps Targets: Multimedia data base access Wireless multimedia communication Components of H.263 are incorporated Content-based compression Synthetic and natural video/audio Multiple tools/algorithms/profiles => Flexibility • 1999 : MPEG-4 Version 2, MPEG-7
Continuous-tone still image • JPEG(Joint Photographic Experts Group) Applications : color FAX, digital still camera, multimedia computer, internet JPEG Standard consists of -a lossy baseline coding system - an extended coding system for greater compression, higher precision or progressive reconstruction applications - a lossless independent coding system for reversible compression References - ITU-T recommendation T.81, “Information Technology - Digital compression and Coding of Continuous-Tone Still Images - Requirements and Guideline”, 92. 2 - K. R. Rao, J. J. Hwang, “Techniques & Standards for Image, Video & Audio Coding”, Prentice Hall PTR, 1996
Baseline system • Baseline system : most widely used among JPEG standards Data precision - 8 bits for input and output - 11 bits for quantized DCT coefficients Algorithm - DCT + quantization + variable length coding Compression Guideline - 0.25 ~ 0.5 bits/pixel : moderate to good quality, some applications - 0.5 ~ 0.75 bits/pixel : good to very good quality, many applications - 0.75 ~ 1.5 bits/pixel : excellent quality, most applications - 1.5 ~ 2.0 bits/pixel : indistinguishable (visually lossless) quality, most demanding applications
Block diagram of baseline system • Baseline system encoder • Baseline system decoder
Quantization and inverse quant. • Quantization table - No default values for quantization tables - Application may specify the tables - Q(u, v) : quantization table integer value from 1 to 255
Example f (x,y) F (u,v) FQ (u,v) Quant. FDCT e (x,y) r (x,y) Inverse Q & IDCT
Entropy coding • DC Coefficient Coding Differential Coding DC coefficients of adjacent blocks are strongly correlated. VLC(Huffman Coding)
Entropy coding(Cont.) • AC coefficients Coding - Zigzag Scanning - VLC(Variable Length Coding, Huffman Coding)
Eg. JPEG Compression Original image (24bpp) JPEG Compressed image ( 32:1 -- 0.75bpp ) JPEG Compressed image (8:1 -- 3bpp) JPEG Compressed image ( 128:1 -- 0.1875bpp )
MPEG Digital Video Technology • MPEG-1( ISO/IEC 11172 ) and MPEG-2( ISO/IEC 13818 ) Applications : MPEG-1 : Digital Storage Media(CD-ROM…) MPEG-2 : Higher bit rates and broader generic applications ( Consumer electronics, Telecommunications, Digital Broadcasting, HDTV, DVD, VOD, etc. ) Coding scheme : Spatial redundancy : DCT + Quantization Temporal redundancy : Motion estimation and compensation Statistical redundancy : VLC References : - ISO/IEC 11172-2 (MPEG-1), ISO/IEC 13818-2 (MPEG-2) - K.R.RAO and J.J. HWANG, “TECHNIQUES & STANDARDS FOR IMAGE•VIDEO & AUDIO CODING,” Prentice Hall, 1996.
MPEG Overview • MPEG : • - Motion Picture Experts Group • - Specifies a standard compression, transmission, and decompression scheme • for video and audio. • - ISO/IEC 11172 : MPEG-1 • - ISO/IEC 13818 : MPEG-2 • - Consists of 3 parts. • Part 1 : System • Part 2 : Video • Part 3 : Audio
MPEG compression of video • How to remove spectral, spatial, temporal, and statistical redundancy?
Removing spatial redundancy • Pixel Coding using the DCT • As human eyes are insensitive to HF color changes, the R,G, B signal is • converted into a luminance and two color difference signals. We can remove • redundancy more on U, V than on Y. • The top left DCT component is taken as the dc datum for the block. • DCT coefficients to the right are increasingly higher horizontal spatial freqs. • DCT coefficients below are higher vertical spatial frequencies.
Temporal redundancy • Inter-frame prediction & motion estimation • This really reduces the overall bit rate from frame to frame!
Putting it all together • I, P, B Frames • The Intra Frames contain full picture information • Predicted(P) Frames are predicted from past I, or P frames • Bi-directional predicted frames offer the greatest compression and use past and future I & P frames for motion compensation.
Building the elementary stream • This slide shows how the actual blocks, slices, frames etc. are all put together to form the elementary stream • Along with the actual picture data, header information is required to reconstruct the I, B, P frames. This header structure is shown. • The next stage is to take this ES and convert it into something that can be transmitted and decoded at the other end.
Ordering frames • Frame Reordering
MPEG-4 • MPEG-4( ISO/IEC 14496 ) Applications : Internet Multimedia Wireless Multimedia Communication Multimedia Contents for Computers and Consumer Electronics Interactive Digital TV Coding scheme : Spatial redundancy : DCT + Quantization, Wavelet Transform Temporal redundancy : Motion estimation and compensation Statistical redundancy : VLC (Huffman Coding, Arithmetic Coding) Shape Coding : Context-based Arithmetic Coding References : - ISO/IEC 14496