270 likes | 437 Views
CM613 Multimedia storage and retrieval Video compression. D.Miller. Need for compression. The active region of a digital television frame, sampled according to CCIR recommendation 601, is 720 pixels by 576 lines for a frame rate of 25 Hz.
E N D
CM613 Multimedia storage and retrievalVideo compression D.Miller
Need for compression The active region of a digital television frame, sampled according to CCIR recommendation 601, is 720 pixels by 576 lines for a frame rate of 25 Hz. Using 8 bits for each Y, U or V pixel, the uncompressed bit rates for 4:2:2 and 4:2:0 signals are therefore: 4:2:2: 720 x 576 x 25 x 8 + 360 x 576 x 25 x ( 8 + 8 ) = 166 Mbit/s 4:2:0: 720 x 576 x 25 x 8 + 360 x 288 x 25 x ( 8 + 8 ) = 124 Mbit/s MPEG-2 is capable of compressing the bit rate of standard-definition 4:2:0 video down to about 3-15 Mbit/s. At the lower bit rates in this range, the impairments introduced by the MPEG-2 coding and decoding process become increasingly objectionable. For digital terrestrial television broadcasting of standard-definition video, a bit rate of around 6 Mbit/s is thought to be a good compromise between picture quality and transmission bandwidth efficiency. http://www.bbc.co.uk/rd/pubs/papers/paper_14/paper_14.shtml see: Tudor 1995
Video • Time-ordered sequence of frames (still images) • digital video contains a great deal of redundancy. The redundancy can be predicted most of the time and thus is very suitable for compression. • JPEG exploits spatial redundancy, • MPEG also exploits temporal redundancy • “In video most consecutive pictures look the same. So if I knew what one picture looked like, then in theory I could build all the others by slightly adjusting that one. This is called prediction. • But things move around in video, so we have to estimate that motion to work out how to shift the pixels around in order to create the next image.” Anil Kokaram www.mee.tcd.ie/sigmedia • human visual qualities allow these lossy compressions to be acceptable.
Temporal redundancy • As there is a good chance of the next video frame being almost the same as previous one you might: • Show the same frame twice. • Only start a completely new frame from time to time. • Keep and re-use some bits of the previous frame (or next frame). • Identify significant changes and just code the differences.
Possible techniques:Difference Coding • Supposeyou compared each frame of a sequence with its predecessor and only pixels that have changed are updated. • In the example above, only a fraction of the number of pixel values ends up being transmitted. http://www.newmediarepublic.com/dvideo/compression/adv07.html
Posible techniques:Difference Coding • If the coding is required to be lossless then every changed pixel must be updated. • There is an overhead associated with indicating which pixels are to be updated, and if the number of pixels to be updated is large, then this overhead can adversely affect compression. • Two modifications can alleviate this problem, but at the cost of introducing some loss. • Firstly, the intensity of many pixels will change only slightly and when coding is allowed to be lossy, only pixels that change significantly need be updated. Thus, not every changed pixel will be updated. (threshold) • Secondly, difference coding need not operate only at the pixel level, it cam operate at the block level. (resolution) http://www.newmediarepublic.com/dvideo/compression/adv07.html
Possible techniques: Block Based Difference Coding • If the frames are divided into non-overlapping blocks and each block is compared with its counterpart in the previous frame, then only blocks that change significantly need be updated. • If, for example, only those blocks of the Table Tennis frame that contain the ball, lower arm and bat were updated, the resulting image might be an acceptable substitute for the original. • There are fewer blocks than pixels to specify where updates are needed, but trade-offs are: • some pixels may be updated unnecessarily, • Discontinuities at edges of blocks may become noticeable. • Difference Coding, no matter how sophisticated, is almost useless where there is a lot of motion. • Only objects that remain stationary within the image can be effectively coded. If there is a lot of motion or indeed if the camera itself is moving, then very few pixels will remain unchanged. • Even a very slow pan of a still scene will have too many changes to allow difference coding to be effective, even though much of the image content remains from frame to frame. http://www.newmediarepublic.com/dvideo/compression/adv07.html
Possible techniques: Motion Compensation • Concept of shifting pieces of the frame around so as to best subtract just the player is called motion compensation and consists of: • 1: Motion estimation (motion vector search) • 2: Motion-compensation-based prediction • 3: Derivation of the prediction error – the difference.
Possible techniques: Motion Compensation • Block based motion compensation, like other interframe compression techniques, produces an approximation of a frame by reusing data contained in the frame’s predecessor. This is has three stages. • First, the frame to be approximated, the current frame, is chopped up into uniform non-overlapping blocks. • Then each block in the current frame is compared to areas of similar size from the previous frame in order to find an area that is similar. A block from the current frame for which a similar area is sought is known as a target block. • The location of the similar or matching block in the past frame might be different from the location of the target block in the current frame. The relative difference in locations is known as the motion vector. If the target block and matching block are found at the same location in their respective frames then the motion vector that describes their difference is known as a zero vector http://www.newmediarepublic.com/dvideo/compression/adv07.html
Current frame – being coded Previous frame Possible techniques: Motion Compensation The motion vector detailing the position (in the past frame) of the target block’s match is encoded in place of the target block itself. Because fewer bits are required to code a motion vector than to code actual blocks, compression is achieved. In the example used above, a perfect replica of the coded image can be reconstructed after decompression. In general this is not possible with block based motion compensation and thus the technique is lossy. http://www.newmediarepublic.com/dvideo/compression/adv07.html
Target block Search area in previous frame is usually limited to a region close to the target block Previous frame Current frame Possible techniques: Motion Compensation Searching In the diagram the search area is square, but it is better to have it rectangular since most motion is horizontal. Block matching takes place only on the luminance component of frames. The colour components of the blocks are included when coding the frame but they are not usually used when evaluating the appropriateness of potential substitutes or candidate blocks. http://www.newmediarepublic.com/dvideo/compression/adv07.html
Possible techniques: Motion Compensation Searching for motion vectors: Exhaustive search is computationally very intensive. Block matching algorithms that find suitable matches for target blocks but require fewer evaluations have been developed. Such algorithms test only some of the candidate blocks from the search area and choose a match from this subset of blocks. Hence they are known as sub-optimal algorithms. Because they do not examine all of the candidate blocks, the choice of matching block might not be as good as that chosen by an exhaustive search. The quality-cost trade-off is usually worthwhile however. http://www.newmediarepublic.com/dvideo/compression/adv07.html
Possible techniques: Motion Compensation • The effectiveness of compression techniques that use block based motion compensation depends on the extent to which the following assumptions hold. • Objects move in a plane that is parallel to the camera plane. Thus the effects of zoom and object rotation are not considered, although tracking in the plane parallel to object motion is. • Illumination is spatially and temporally uniform. That is, the level of lighting is constant throughout the image and does not change over time. • Occlusion of one object by another, and uncovered background are not considered. http://www.newmediarepublic.com/dvideo/compression/adv07.html
Possible techniques: Bi-directional Motion Compensation • Bidirectional motion compensation uses matching blocks from both a past frame and a future frame to code the current frame. • A future frame is a frame that is displayed after the current frame. • Considering the chess board example, suppose that a player is fortunate enough to have a once lost queen replace a pawn on the board. • If the queen does not appear on the board before the current move then no block containing the queen can be copied from the previous state of play and used to describe the current state. • After the next move, however, the queen might be on the board. If in addition to the state of play immediately before the current move, the state of play immediately following is also available to the receiver, then the current image of the chess board can be reproduced by taking blocks from both the past and future frames. http://www.newmediarepublic.com/dvideo/compression/adv07.html
MPEG-2 Standard • Develop started in 1990, mainly to establish a standard for digital broadcast TV. • MPEG-1 was up to 1.5 Mbit/s. MPEG-2 typically over 4MBit/s but can be up to 80 Mbit/s. • Includes audio and video standards. • Borrows techniques from JPEG standard for spatial compression. • Uses some of the techniques introduced earlier for temporal compression.
MPEG-2 • Moving Pictures Experts Group (est. 1988) To create standards for video and audio. • MPEG-1 • - Up to 1.5Mbit/s (1.2 for video, 256 kbps audio) • - Quality approximates VHS (analogue) and CD audio. • Picture resolution 352 X 288 for PAL video at 25 frames/sec. • -supports several methods of generating a prediction of a block: • forward prediction (by searching a previous picture) • backward prediction (by searching a future picture) • bi-directional prediction (by averaging the result of searching a previous and a future picture). • But new standard required for digital television etc.
MPEG-2 Motion prediction To discuss these basics we can draw a crude analogy with a block-based jigsaw puzzle Sequence 1 Frame n Frame n+1 Frame n+2 Frame n+3 Sequence 2
MPEG-2 Motion prediction Assume only forward prediction is available Imagine “frame n” as a block jigsaw puzzle. Could I make “frame n+1” from it? Sequence 1 Frame n Frame n+1 Sequence 2
MPEG-2 Motion prediction Assume only forward prediction is available Imagine “frame n+1” as a block jigsaw puzzle. Could “frame n+2” from it? Sequence 1 Frame n+1 Frame n+2 Sequence 2
MPEG-2 Motion prediction Assume both forward prediction and backward prediction are available Imagine “frame n+1”and “frame n+3” as block jigsaw puzzles. Could I make “frame n+2” from them? Sequence 1 Frame n+1 Frame n+2 Frame n+3 Sequence 2
MPEG-2 organising resources for motion prediction The encoder chooses which prediction mode (forwards, backwards, bi-directional) will give the best quality results and transmits this to the decoder along with the image data. To support these prediction modes MPEG-2 defines three picture types: I-picture (Intra picture) – coded without reference to other pictures. P-picture (Predictive picture) – can use previous I- or P- picture for motion compensation. B-picture (Bi-directional picture) – can use the previous and next I- or P-pictures for motion compensation. The coder rearranges the frame sequence from the natural (“display”) order into a “bitstream” order, so that the decoder has the right frames available at the right time. see: Tudor 1995
MPEG-2 organising resources for motion prediction The picture types typically occur in a repeating sequence called a Group of Pictures (GoP): B1 B2 I3B4 B5 P6B7 B8 P9B10 B11 P12 I3 B1 B2 P6 B4 B5 P9B7 B8 P12B10 B11 [ P– can use previous I- or P- picture for motion compensation. B- can use the previous and next I- or P-pictures for motion compensation.] In the GoP above, the I-picture and P-pictures are received by the decode ahead of their natural sequence because the B-picture need them to look into the future. see: Tudor 1995
MPEG-2 organising resources for motion prediction Typical relative sizes I-picture (Intra picture) = 6 x n bits. P-picture (Predictive picture) = 2 x n bits B-picture (Bi-directional picture) = n bits see: Tudor 1995
Source: Anil Kokaram www.mee.tcd.ie “energy saving” analogy – reducing entropy Original sequence Discrete Cosine Transforms of Original sequence Displaced Frame Differences (without motion compensation) Motion-compensated DFDs DCTs of DFDs
MPEG-2 Buffering If a fixed bit rate channel is used, then buffering is required. Decoder Encoder Output buffer Input buffer Emptied at a constant rate by the channel. Filled at a variable rate because the encoder output bit rate is variable (depends on how much change is going on between frames) Feedback mechanism detects when buffer is at risk of over-flowing or under-flowing. This is used to adjust the degree of quantisation – and hence the quality of the images being transmitted.
Sources and recommended further reading Kokaram, A.C. Lecture notes: The essence of image and video compression. Trinity College, Dublin. Available at: http://www.mee.tcd.ie/~ack/teaching/1e8/lecture3.pdf (accessed 03/06) Li,Z.N. and Drew,M.S. (2004) Fundamentals of multimedia, Prentice Hall http://www.newmediarepublic.com/dvideo/compression.html (accessed 03/06) Tudor, N.P (1995) MPEG-2 video compression, Electronics and communication engineering journal. Available at: www.bbc.co.uk/rd/pubs/papers/paper_14/paper_14.shtml (accessed 03/06) See also: http://www.cs.bris.ac.uk/~janko/city/DBT_03_VideoCompression_MPEG.pdf (accessed 03/06)