250 likes | 566 Views
MPEG. MPEG-Video This deals with the compression of video signals to about 1.5 Mbits/s; MPEG-Audio This deals with the compression of digital audio signals at a rates of 64, 128 or 192 kbits/s per channel;
E N D
MPEG MPEG-Video This deals with the compression of video signals to about 1.5 Mbits/s; MPEG-Audio This deals with the compression of digital audio signals at a rates of 64, 128 or 192 kbits/s per channel; MPEG-System This deals with synchronisation and multiplexing of multiple compressed audio and video bit streams.
MPEG Versions • MPEG 1 for CD ROM. • MPEG 2 for broadcast quality. • Other MPEG versions (eg MPEG7) are not compression systems. • We will be talking about MPEG 2
Data and Compression rates • MPEG1 video has data rates up to 1.5 Mbits/s. • MPEG2 video has data rates between 3-15 Mbits/s for broadcast and 15-30 Mbits/s for high definition. • The commercial uncompressed digital video data stream (SDI) has a data rate of 270 Mbits/s, although this includes audio and facility for 10 bit video coding.
Data and Compression rates • However even if we consider transmitting monochrome television pictures of size (576 x 720) 25 frames per second at 8 bits resolution. We have a data rate of 576 x 720 x 25 x 8 = 82.944 Mbits/s. • We have to double this (at least) for colour giving nearly 166 Mbits/second. • Therefore MPEG can give 100:1 compression or more.
Spatial and temporal redundancy • MPEG makes use of temporal and spatial redundancy. • Temporal redundancy means that we are unnecessarily transmitting the same information (data) over time. • Eg Backgrounds do not need to be sent every frame.
Spatial and temporal redundancy • Spatial redundancy .means we are unnecessarily transmitting detail information (spatial information) which cannot be perceived by the eye. • This is what JPEG does on still images. • By avoiding to carry this unnecessary (redundant) information we can achieve compression. • Note that while the spatial compression is lossy, the temporal compression is not.
Comparison with JPEG • MPEG the same spatial compression method as JPEG. • The temporal compression uses other techniques.
Overview of MPEG • MPEG takes incoming frames and produces a spatially compressed image. • MPEG also predicts motion in the scene and estimates where blocks of pixels have moved to in another frame. • MPEG can then transmit vector (or motion) information only to predict the next frame. • However since the prediction can be inaccurate MPEG also transmits an error picture (spatially compressed) with the vector predictions.
Prediction and macroblocks • MPEG divides each frame into blocks of size 16 x 16 pixels called macroblocks. • The idea is to find which block in the predicted frame have the pixels in the reference frame moved to. • This can be done by comparing each macroblock in the reference frame with possible position in the predicted frame and finding the closest match. • We then send a prediction “vector” which describes the movement of each block.
Difference pictures • Unfortunately, 16 x16 blocks are quite large so it is unlikely that all the pixels in one block will have moved to another. • There will generally, therefore, be errors in the prediction made by moving blocks around. • We know the error at the sending end, because it is simply the difference between the actual picture and the predicted picture. • So if we send the error as well as the prediction, we can reconstruct the actual picture.
I-Frames • I (Intrapicture) – I-frames do not any motion prediction. They use spatial compression only. That is, the complete frame is transmitted in a JPEG like form. They are needed for several reasons including: • To start an MPEG sequence off (since there is nothing to predict one the first frame) • So that an MPEG stream may be joined at a point other than the start. • To recover from errors and degradation caused by repeated reference to previous frames. • Sometimes called keyframes.
I-Frames • I (Intrapicture) – I-frames do not any motion prediction. They use spatial compression only. That is, the complete frame is transmitted in a JPEG like form. They are needed for several reasons including: • To start an MPEG sequence off (since there is nothing to predict one the first frame) • So that an MPEG stream may be joined at a point other than the start. • To recover from errors and degradation caused by repeated reference to previous frames. • Sometimes called keyframes.
P-Frames • P (Predicted Picture) – P-frames send only motion prediction information and a spatially compressed error picture. • The actual frame is constructed from a previous frame, with the pixels in the “macroblocks” moved to their new location. • Since this may be far from perfect the compressed error picture is added to compensate. • The previous frame could be an I-frame or another P-frame. • In the situation where nothing moves in the scene then the P-frame information is zero and the actual constructed frame is the same as the previous one. (maximum compression).
B-Frames • Imagine the situation where an object moves to reveal a (stationary) background. • Since this background may be fully revealed in a later frame. We could use this future frame as a reference and backwardly predict previous frames. • Also, if we now the positions of blocks in future and previous frames we can predict intermediate frames. • B (Bi-directional prediction) – Allows interpolation and prediction from both previous and future (I and P) frames. • B-frames allow the most compression.
B-Frames • There are clearly associated problems with bi-directional frames. • We have to wait for future incoming video before they can be coded. This causes delay. • We have to transmit future frames before intermediate B-frames so that the decoder has the future and previous references available to construct the actual frame from the B-frames.
Groups of pictures (GOP) • The MPEG sequence therefore consists of a combination of I-, P- and B-frames. • This sequence is called a group of pictures (GOP) • Usually the group repeats (but it does not have to); for example a typical group of 12 frames. • B1B2I3B4B5P6B7B8P9B10B11P12 • The subscripts indicate the original video frame order.
Groups of pictures (GOP) (order of sending) • However, as indicated above the order is different in the actual bit stream because frames cannot be predicted without the appropriate reference. • The corresponding sending order (bitstream) would therefore be: • I3B1B2P6B4B5P9B7B8P12B10B11
Exercise • A video sequence is coded using the following GOP: • B3 B4P1 P2 I5 • Suggest a suitable corresponding bitstream sequence.
Quality of service and variable quantisation. • The amount of redundancy (both spatial and temporal) in moving video pictures varies, depending on the programme content. • Sometimes almost zero data is transmitted. For example a still frame. While in action sequences the amount of data produced is large. • It is desirable to produce a constant data rate.
Quality of service and variable quantisation. • The data is therefore buffered (stored) and often transmitted at a constant rate. • This allows the system to nearly fill the buffer when the data produced is large, but operate with an empty buffer when little data is produced. • Sometimes, when there is a lot of change between one frame and the next, the buffer would overflow if some action where not taken to prevent this from happening. .
Quality of service and variable quantisation. • The system therefore produces larger quantisation steps to the DCT co-efficients (rejecting more high frequency components) when this happens to prevent system failure. • Sometimes only the dc component remains. • This results in poorer quality pictures (blocking and smearing) at times of low spatial and temporal redundancy.
Quality of service and variable quantisation. • This can be seen on most digital television systems. • Therefore the quaility of service depends on the (previously agreed) output data rate.
Further reading. • www.mpeg.org • Art of Digital Video, Watkinson, Focal press. • www.snellwilcox.com/reference/pdfs/ecomp.pdf