运动图像国际压缩标准 MPEG

运动图像国际压缩标准MPEG 2005年fall

1 概述 • MPEG(Motion Picture Experts Group)是运动图像专家小组的英文缩写。这是一个为视频压缩开发制造与平台独立标准的全球性组织。 • MPEG的活动始于1988年。 • JPEG和MPEG都是在ISO领导下的专家小组，其成员也有很大的交叠。JPEG的目标集中于静止图像压缩，而MPEG的目标是针对活动图像的数据压缩，但静止图像与活动图像有密切关系。

MPEG • 国际标准化组织(International Organization for Standardization，ISO)和国际电工委员会(International Electro-technical Commission，IEC)联合成立ISO/IEC JTC1/SC29/WG11，负责开发电视图像数据和声音数据的编码、解码和它们的同步等标准 • MPEG标准主要有MPEG-1、MPEG-2、MPEG-4和正在制定的MPEG-7等。

MPEG标准文件的创建过程 • 工作文件(Working Draft，WD) • 工作组(Working Group，WG)准备的工作文件 • 委员会草案(Committee Draft，CD) • 从工作组WG准备好的工作文件WD提升上来的文件。这是ISO文档的最初形式，由ISO内部正式调查研究和投票表决 • 国际标准草案(Draft International Standard，DIS) • 投票成员国对CD的内容和说明满意之后由委员会草案CD提升上来的文件 • 国际标准(International Standard，IS) • 由投票成员国、ISO的其他部门和其他委员会投票通过之后出版发布的文件

MPEG的第一个成果MPEG-1于1992年推出，是VCD的基础。由于有限的352×288像素分辨率，MPEG-1只适用于家庭环境，获得的视频质量及数据率相当低。MPEG的第一个成果MPEG-1于1992年推出，是VCD的基础。由于有限的352×288像素分辨率，MPEG-1只适用于家庭环境，获得的视频质量及数据率相当低。 • 1995推出MPEG-2。720×576的像素以及更高的分辨率大大提高了视频质量。 • 1999年12月发布了MPEG-4。 • MPEG-7为多媒体内容描述接口标准。从MPEG组织成立至今，其任务和方向都发生了很多变化。MPEG-1和MPEG-2已经是成熟的编码标准，现在的热点主要集中在MPEG-4 和 MPEG-7上。

MPEG系列 • MPEG-1：ISO/IEC 11172 • MPEG-2：ISO/IEC 13818 • MPEG-4：ISO/IEC 14496 • MPEG-7：ISO/IEC 15938 • MPEG-21：ISO/IEC 21000

组成 • video coding • audio coding • system definition • which describes the combination of individual data streams into a common stream.

2 视频编码 • An image must consist of three components. • luminance Y • two color difference signals Cr and Cb • color subsampling • 14 different pixel aspect ratios • 1:1 • 16:9 • 4:3

refresh frequency • 23.976Hz, 24Hz, 25Hz, 29.97Hz, 30Hz, 50Hz, 59.94Hz, and 60Hz • An MPEG macro block is partitioned into 16×16 pixels for the luminance component and 8×8 pixels for each of the two chrominance components. • A macro block is formed of six blocks of 8×8 pixels: first four blocks for the luminance component then the two chrominance blocks.

宏块 • 获得高速压缩的关键是去掉尽可能多的冗余，在静止图像压缩方面，MPEG和JPEG算法几乎是一样的。首先把图像转换成YUV空间。Y分量被划分成1616的小块，U及V分量被划分成88；然后，把1616亮度块再划分成4个88块，这样88块就可以进行DCT变换。 • 由一个1616像素的亮度信息和两个88像素的色度信息组成的块称为宏块。一幅静态图像就是由许多这样的宏块组成。对于分辨率为352240的NTSC制式的一幅图像，有2215=330个宏块组成。对于分辨率是352288的PAL制式的一幅图像，有2218=396个宏块组成。

宏块的组成

efficient coding • temporal redundancies of successive images • random access • images are coded individually. • MPEG supports four types of image coding. • I • P • B • D

I帧(帧内图像intra frame),是对整幅图像采用JPEG编码的图像，是一个独立的帧，其信息由自身画面决定，不需要参照其他画面而产生，是P图和B图的参考图。 • P图（前向预测帧Predicted Picture）,参照前一幅I或P图像做运动补偿编码。 • B图像(双向预测 Bidirectional Prediction)，它参照前一幅和后一幅I或P图像做双向运动补偿编码。

I frames (intra coded pictures) • coded without using information about other frames (intraframe coding). • An I frame is treated as a still image. Here MPEG falls back on the results of JPEG. Unlike JPEG, real-time compression must be possible. The compression rate is thus the lowest within MPEG. • I frames form the anchors for random access.

I frames are encoded as in JPEG. • A DCT on the 8×8 blocks defined within the macro blocks • The DC-coefficients are then DPCM coded, the differences between consecutive blocks of each component are calculated and transformed into variable-length code words. • AC-coefficients are run-length encoded and then transformed into variable-length code words. • MPEG distinguishes two types of macro blocks: • those contain only coded data • those additionally contain a parameter used for scaling the characteristic curve used for subsequent quantization.

I帧图像采用帧内编码方式，即只利用了单帧图像内的空间相关性，而没有利用时间相关性。由于I帧不依赖其他帧，所以是随机存取的入点，同时是解码的基准帧。I帧图像采用帧内编码方式，即只利用了单帧图像内的空间相关性，而没有利用时间相关性。由于I帧不依赖其他帧，所以是随机存取的入点，同时是解码的基准帧。 • I帧主要用于接收机的初始化和信道的获取，以及节目的切换和插入，I帧图像的压缩倍数相对较低。 • I帧图像周期性地出现在图像序列中的，出现频率可由编码器选择。

P frames (predictive coded pictures) • require information about previous I and/or P frames for encoding and decoding. • Decoding a P frame requires decompression of the last I frame and any intervening P frames. • The compression ratio is considerably higher than for I frames. • A P frame allows the following P frame to be accessed if there are no intervening I frames.

the most similar macro block in the preceding image must be determined • MPEG does not specify an algorithm for motion estimation, but rather specifies the coding of the result. • motion vector (the spatial difference between the two macro blocks) and the small difference between the macro blocks need to be encoded. • The search range, that is, the maximum length of the motion vector, is not defined by the standard. As the search range is increased, the motion estimation becomes better, although the computation becomes slower.

运动补偿 • 运动补偿算法是当前视频图像压缩技术中使用最普遍的方法之一。帧序列的相邻画面之间的运动部分具有连续性，即当前画面上的图像可以看成是前面画面某时刻画面的位移，位移的幅度值和方向在画面各处可以不同。 • 运动补偿工作于宏块一级，主要是消除预测图与插补图在时间上的冗余，以提高压缩比。运动补偿是一种预测，它不是对每个像素预测，而是以1616图像块为单位的预测。 • 运动补偿把当前子块认为是先前面某个时刻图像块的位移，位移（运动矢量）的内容包括运动方向和运动幅度。

宏预测与运动补偿示意图

Block Motion Estimation

Video sequence : Tennis frame 0

Video sequence : Tennis frame 1

Frame Difference

Motion Vector－Motion Estimation

P图是把I图中的“准宏块”复制过来，拼成的一幅图。“准宏块”的边界不是I图中的1616的宏块，是I图中的一个类似块，这一个复制过程称为“运动”。由于P是在I的将来，所以称为“前向预测”。P图是把I图中的“准宏块”复制过来，拼成的一幅图。“准宏块”的边界不是I图中的1616的宏块，是I图中的一个类似块，这一个复制过程称为“运动”。由于P是在I的将来，所以称为“前向预测”。 • 把一个类似块复制过来之后，与真正的P图是不吻合的，需要修正，这个过程就是运动补偿。经过“补偿”之后，P图就与原来没压缩的图像相差无几了。

1616的运动矢量块是预测误差，必须进行编码、传送、供解码时恢复图像时使用。1616的运动矢量块是预测误差，必须进行编码、传送、供解码时恢复图像时使用。 • 不同区域宏块的运动矢量，可有不同的选择，运动矢量的选择范围是基于帧间图像的时间分辨率，和块内图像的时间分辨率，以及帧序列图像的性质而选定。例如，当两个1616宏块所包含的画面内容在传送中完全静止不动，那么宏块的运动矢量为零（宏块的坐标没有改变）。

P frames can consist of macro blocks as in I frames, as well as six different predictive macro blocks. • In coding P-frame-specific macro blocks • differences between macro blocks as well as the motion vector need to be considered. • The difference values between all six 8×8 pixel blocks of a macro block being coded and the best matching macro block are transformed using a two-dimensional DCT.

Further data reduction is achieved by not further processing blocks where all DCT coefficients are zero. This is coded by inserting a six-bit value into the encoded data stream. • Otherwise, the DC- and AC-coefficients are then encoded using the same technique. • Next, run-length encoding is applied and a variable length coding is determined according to an algorithm similar to Huffman. • motion vectors of adjacent macro blocks are DPCM coded. The result is again transformed into variable-length coded words using a table.

B frames • B frames(bidirectionally predictive coded pictures) require information from previous and following I and/or P frames. • B frames yield the highest compression ratio attainable in MPEG. • A B frame is defined as the difference from a prediction based on a previous and a following I or P frame. • It cannot ever serve as a reference for prediction coding of other pictures.

A macro block can be derived from macroblocks of previous and following P and/or I frames. • a prediction can interpolate two similar macro blocks. • two motion vectors are encoded • one difference block is determined between the macro block to be encoded and the interpolated macro block. • Subsequent quantization and entropy encoding are performed as for P-frame-specific macro blocks. • B frames need not be stored in the decoder.

D frames • D frames (DC coded pictures) are intraframe-coded and can be used for efficient fast forward. • During the DCT, only the DC-coefficients are coded; the AC coefficients are ignored.

D frames contain only the low-frequency components of an image. • A D-frame always consists of one type of macro block and only the DC-coefficients of the DCT are coded. • D frames are used for fast-forward display. This could also be realized by a suitable placement of I frames.

P帧和B帧图像采用帧间编码方式，即同时利用了空间和时间上的相关性。P帧和B帧图像采用帧间编码方式，即同时利用了空间和时间上的相关性。 • P帧图像只采用前向时间预测，可以提高压缩效率和图像质量。P帧图像中可以包含帧内编码的部分，即P帧中的每一个宏块可以是前向预测，也可以是帧内编码。 • B帧图像采用双向时间预测，可以大大提高压缩倍数。由于B帧图像采用了未来帧作为参考，因此MPEG-1编码码流中图像帧的传输顺序和显示顺序是不同的。 • 从压缩的程度来看，I图的压缩率最小；由于P图只存储当前帧和参考帧的误差信号，因此P图得到了较大的压缩；而B图的压缩率是最大的，这也使得B帧不能作为预测基准的原因。

MPEG的帧序列 • 使MPEG获得较大的压缩率的方法是消除连续帧中的时间冗余。无论在视频上看到如何激烈的动作，两幅连续帧之间的差别总是很小的。由于JPEG只压缩一幅单独图像的信息，所以 MPEG必须处理时间冗余。 • 从根本上讲，这属于差分编码的技术。首先在发送端发送一个基本帧，然后比较后续帧的区别进行编码，压缩后加以传送。接收端能够根据第一个基本帧和接收到的差值重建所有的帧。

把这种思想加以扩展就是MPEG所做的工作，当然，MPEG要比这复杂。把这种思想加以扩展就是MPEG所做的工作，当然，MPEG要比这复杂。 • 计算当前帧与前一个帧的差别来处理那些在视野中移动的图形是非常有效的，因为那些图形就在前一个帧中。 • 但它不适用于那些不在前一个帧中的图像。比如说，一个全新的情景就不能这样压缩。新老情景间的差别很大，这时很可能不得不发送新的场景。

不同的帧类型在一个帧序列中应按什么形式排列？不同的帧类型在一个帧序列中应按什么形式排列？ • 要保证I帧必须在任何帧序列中周期性地出现。这是因为差分编码计算适用于帧之间差别极小的情况，但与一个固定帧差别很小的情况总是局限在相对较短的一段时间内，如果出现新的物体，随后情景就会发生改变。这种情况涉及那些藏在某些移动体后面的物体。例如当一个人在一个场景中移动时，前一帧中原本藏在人后面的物体会出现在后续的帧中。让I帧周期性地出现确保差异是相对于最近的情景进行计算的，能消除错误的传播。

怎样从其他帧重建P帧和B帧？ • 播放时看到的帧次序不是传送的帧的次序。P帧在最初的两个B帧前面传送，而第二个I帧在最后的两个B帧前面传送。然后P帧和两个I帧可以被缓存起来，这样接下来收到的B帧就可以在观看端进行解码。

Quantization • AC-coefficients of B and P frames are usually very large values, whereas those of I frames are very small. • MPEG quantization adjusts itself accordingly. • If the data rate increases too much, quantization becomes more coarse. • If the data rate falls, then quantization is performed with finer granularity.

3 语音编码 • MPEG audio coding is compatible with the coding of audio data used for Compact Disc Digital Audio (CD-DA) and Digital Audio Tape (DAT). • The most important criterion is the choice of sample rate of 44.1kHz or 48kHz (additionally 32kHz) at 16bits per sample value. Each audio signal is compressed to either 64, 96, 128, or 192Kbit/s.

Three quality levels (layers) are defined with different encoding and decoding complexity. • An implementation of a higher layer must be able to decode the MPEG audio signals of lower layers • FFT is applied for audio, and the spectrum is divided into 32 nonoverlapping subbands • noise level in each subband is determined using a psychoacoustic model.

In the first and second layers, the appropriately quantized spectral components are simply PCM-encoded. • The third layer additionally performs Huffman coding. • MPEG provides for two types of stereo sound. • Two channels are processed completely independently. • In the joint stereo mode, MPEG achieves a higher compression ratio by exploiting redundancies between the two channels

The minimal value is always 32Kbit/s. • The layers support different maximal bit rates: • layer 1 allows for a maximum of 448Kbit/s • layer 2 for 384Kbit/s • layer 3 for 320Kbit/s. • For layers 1 and 2, not all combinations of bit rate and mode are allowed, and a decoder is not required to support a variable bit rate. • In layer 3, a variable bit rate is specified by allowing the bit rate index to be switched.

4 数据流 • An audio stream is comprised of frames, which are made up of audio access units, which in turn are divided into slots. • An audio access unit is the smallest compressed audio sequence that can be completely decoded independently of all other data.

Video Stream • A video stream is comprised of 6 layers: • sequence layer • the beginning of the sequence layer includes two entries: the constant bit rate of the sequence and the minimum storage capacity required during decoding. • A video buffer verifier influences the quantizer and forms a type of control loop. • group of pictures layer • This layer contains at least an I frame, which must be one of the first images. • the difference between decoding order and display order

picture layer • contains a whole still image. • image number. • slice layer • Each slice consists of macro blocks • A slice also includes the scaling used for DCT quantization of all its macro blocks. • macro block layer • block layer

System Definition • specifies the combination of audio and video data streams • the coordination of input data streams with output data streams, clock adjustment, and buffer management. • One could define a protocol to supply the header upon request. • MPEG does not prescribe compression in real-time. • MPEG defines the decoding process but not the decoder itself.

运动图像国际压缩标准 MPEG