570 likes | 743 Views
VIDEO COMPRESSION FUNDAMENTALS, part 2. Pamela C. Cosman. Extra flavors and refinements. Many different variations/improvements possible for motion compensation Increased accuracy of motion vectors Unrestricted motion vectors Multiple frame prediction Variable sized blocks
E N D
VIDEO COMPRESSION FUNDAMENTALS, part 2 Pamela C. Cosman
Extra flavors and refinements • Many different variations/improvements possible for motion compensation • Increased accuracy of motion vectors • Unrestricted motion vectors • Multiple frame prediction • Variable sized blocks • Motion compensation for objects
Accuracy of Motion Vectors • Digital images are sampled on a grid. What if the actual motion does not move in grid steps? • Solution: interpolation of grid points in reference frame adds a half-pixel grid • Reference frame effectively has 4 times as many positions for the best match block to be found
Unrestricted Motion Vectors • Suppose the camera is panning to the left • Now consider the lower left macroblock in the current frame. • What is the best match for it in the reference frame? Lower Left Macroblock Reference Frame Current Frame
Unrestricted Motion Vectors • If the macroblock were allowed to hang over the edge, then the best match would be like this: • But then the motion vector is pointing outside the frame! • The encoder and decoder can agree on some standard interpolation to deal with this case Lower Left Macroblock Reference Frame Current Frame
Reference Frame Unrestricted Motion Vectors Current Frame • The edge pixels in the reference frame are just replicated outside the frame, for as many extra columns as necessary • In this way, a motion vector pointing outside the frame is acceptable. Can get better matches!
Arbitrary Multiple Reference Frames • In H.261, the reference frame for prediction is always the previous frame • In MPEG and H.263, some frames are predicted from both the previous and the next frames (bi-prediction) • In H.264, any frame may be designated to be used as reference: • Encoder and decoder maintain synchronized buffers of available frames (previously decoded) • Reference frame is specified as index into this buffer
H.264 allows multiple frames to be used as references Multiple Frame Prediction
Some Advantages of Multiple References • If object leaves scene and then comes back, can have a reference for it in long term past • Similarly, if the camera pans to the right, and then back to the left, then the scene that reappears has a reference • If there’s an error, and the receiver sends feedback to say where the error is, then the encoder can use another reference frame • Helpful even if there’s no feedback
Variable Block-Size MC • Motivation: size of moving/stationary objects is variable • Many small blocks may take too many bits to encode • Few large blocks give lousy prediction • Choices: In H.264, each 16x16 macroblock may be: • Kept whole, or • Divided horizontally (vertically) into two sub-blocks of size 16x8 (8x16) • Divided into 4 sub-blocks (8x8) • In the last case, the 4 sub-blocks may be divided once more into 2 or 4 smaller blocks.
16 x 16 8 x 8 8 x 16 8 x 8 16 x 8 4 x 8 4 x 4 H.264 Variable Block Sizes 8 x8 16 x 16 8 x8 Tree-Structured Motion Compensation 16 x 8 8 x 16 8 x 4
Motion Scale Example T=1 T=2
Variable Output Rate • Typically, more bits produced when there is high motion or fine detail • Example: # of bits per frame varies from 1300 to 9000 • (32-225 kbits per second) • Suppose the control parameters of a video encoder are kept constant: • Quantization parameter • Motion estimation search window size, etc. • Then the # of coded bits per macroblock (and per frame) will vary Bits 9000 per frame 1000 0 Frame Number 200
Rate Control • Streams are usually coded for target rates, for example, 3 Mbit/second • How are bits allocated among frames? • Macroblocks in I-frames are all intra coded • Macroblocks in P/B frames can be coded as: • Intra (DCT blocks) • Motion vectors only • Motion vectors and difference DCT blocks • Nothing at all (skipped)
Rate Control • The frames will have differing numbers of bits • This variation in bit rate can be a problem for many practical delivery and storage mechanisms • Constant bit rate channel (such as a circuit-switched channel) cannot transport a variable-bitrate data stream • Even a packet-switched channel is limited by link rates and congestion at any point in time
DECODER ENCODER Buffer Buffer Variable bit rate Constant rate channel Variable bit rate outputfrom encoder input to decoder Constant rate channel • The variable data rate produced by an encoder can be smoothed by buffering prior to transmission • First In/First Out (FIFO) buffer at the output of the encoder; another one at the input to the decoder • Emptied by the decoder at a variable rate
Decoder Buffer Contents Takes 0.5 sec before first complete coded frame received Then, decoder can extract and decode frames at correct rate of 25 fps until… At about 4 sec, buffer empties, decoder stalls (pauses decoding) Problem: video clip freezes until more data arrives Partial solution: add deliberate delay at decoder (e.g., 1 sec delay to decode frame 1, allow buffer to reach higher fullness) First frame decoded stall 0 1 2 3 4 seconds 7 8 9
Variable Bit Rate • Example shows that variable coded bit rate can be adapted to a constant bit rate delivery medium using buffers. This entails • Cost of buffer storage space • Delay • Not possible to cope with arbitrary variation of bit rate using this method, unless buffer size and decoding delay allowed to get arbitrarily large. • So… encoder needs to keep track of buffer fullness…
Rate Control • Goal: with the transmission system at the target rate for the video sequence, the encoder & decoder buffers of fixed size never overflow or underflow • This is the problem of rate control • MPEG does not specify how to achieve this • In addition to preventing overflow/underflow, the rate control algorithm should also make the sequence look good
Choice of Rate Control Algorithm • Choice of rate control depends on application 1) Offline encoding of video for DVD storage • Processing time not a constraint • Complex algorithm can be employed • Two-pass encoding: • Encoder collects statistics about the video in the 1st pass • Encoder encodes the video on the 2nd pass • Goal is to “fit” the video on the DVD while: • maximizing the overall quality of the video • preventing buffer overflow or underflow during decoding
Choice of Rate Control • 2) Encoding of live video for broadcast • One encoder and multiple decoders • Decoder processing and buffering are limited • Encoder may use expensive fast hardware • Delay of a few seconds usually OK • Medium-complexity rate-control algorithm • Perhaps two-pass encoding of each frame
Choice of Rate Control • 3) Encoding for two-way videoconferencing • Each terminal does both encoding and decoding • Delay must be kept to a minimum (say <0.5 sec) • Low-complexity rate control • Buffering minimized to keep delay small • Encoder must tightly control output rate • This may cause the output quality to vary significantly, e.g., may drop when there is increased movement or detail in the scene
Rate Control • Various possible approaches to rate control • For example, calculate a target bit rate Ri for a frame based on • The number of frames in the group of pictures • The number of bits available for the remaining frames in the group • The maximum acceptable buffer size contents • The estimated complexity of the frame
Rate Control: Example Algorithm Encode the current frame using parameter Q Update the model parameters X1 and X2 based on the actual number of bits generated for the frame There are also macroblock-level rate control algorithms when “tight” rate control is needed • Let S be the mean absolute value of the difference frame after motion compensation (a measure of frame complexity) • Calculate S for the frame • Compute the quantizer step size Q using the model
Standards • Standards Groups (MPEG, VCEG) • H.261: Videophone/videoconferencing (1990) • MPEG-1: Low bit rates for dig. storage (1992) • MPEG-2: Generic coding algorithms (1994) • H.263: Very low bit rate coding (1995) • MPEG-4: Flexibility and computer vision approaches (1998) • H.264: Recent improvements (2003)
Disadvantages of standardization: Improvements in price and performance come from battle to create and own proprietary approach Proprietary codecs generally exhibit higher quality than a standard Standards are slow moving, developed by committee, try to avoid patents Advantages of standardization: Interoperability Different platforms supported Vendors can compete for improved implementations Worldwide technical community can build on each other’s work Several standards have been hugely successful Advantages/Disadvantages
H.261: real-time, low complexity, low delay • Motivated by the definition and planned deployment of ISDN (Integrated Services Digital Network) • Rate of p*64 kbits/s where p is integer 1…30 • For example, p=2→ 128 kbits/s with video coding at 112 kbits/s and audio at 16 kbits/s • Applications: videophone, videoconferencing • Videoconferencing compression: • Operate in real time • Not much coding delay • Low complexity • No particular advantage to shifting the complexity onto encoder or decoder (each user will require both encoding and decoding capabilities)
H.261 Basics • Standardization started 1984, finished 1990 • Uncompressed CIF (4:2:0 chrom. sampling, 15 frames per sec.) requires 18.3 Mbps • To get this down to p x 64 Kbps requires 10:1 up to 300:1 compression • H.261 achieves compression using the same basic elements discussed before: • Motion compensation (for temporal redundancy) • DCT + Quantization (for spatial redundancy) • Variable length coding (run-length, Huffman)
H.261 Motion Compensation • Motion compensation done on macroblocks of size 16 x 16, same as MPEG-1 and -2 • However, consider application fields: videoconferencing, videophone • A call is set up, conducted, and terminated. • These events always occur together, in sequence • Don’t need random access into the video • Need low delay • Also, expect slow-moving objects • Question: What features should these facts lead to?
H.261 Motion Estimation • Slow movement: For each block of pixels in the current frame, the search window is only ± 15 pixels in each direction 15 15 15 15 T=1 (previous frame) T=2 (current frame)
H.261 Motion Compensation • No B pictures: don’t want the delay or complexity associated with them • H.261 uses forward motion compensation from the previous picture only • First frame is Intra-frame. NO frame after that has to be Intra. Every subsequent frame may use prediction from the one before • This means that to decode a particular frame in the sequence, it is possible that we will have to decode from the very beginning. No random access.
ISO MPEG • Originally set up in 1988, committee had 3 work items: • MPEG-1: targeted at 1.5 Mbps • MPEG-2: targeted at 10 Mbps • MPEG-3: targeted at 40 Mbps • Later, became clear that algorithms developed for MPEG-2 would accommodate higher rates, so 3rd work item dropped • Later MPEG-4 added • Goals: • MPEG-1: compression of video/audio for CD playback • MPEG-2: storage and broadcast of TV-quality audio and video • MPEG-4: coding of audio-visual objects • Also MPEG-7 and MPEG-21 which are about multimedia content and not compression
MPEG-1 Audiovisual coder for digital storage media • Goal: Coding full-motion video & associated audio at bit rates up to about 1.5 Mbps • Brief history of MPEG-1 • October 1988: working group formed • September 1989: 14 proposals made • October 1989: video subjective tests performed • March 1990: simulation model • November 1992: international standard • Solution to a specific problem: • Compress an audio-video source (~210 Mbps) to fit into a CD-ROM originally designed to handle uncompressed audio alone (requires aggressive compression 200:1)
MPEG-1 major differences • Unlike videoconferencing, for digital storage media, random access capability is important • INTRA frames • In order to avoid a long delay between the frame a user is looking for, and the frame where decoding starts, INTRA frames should occur frequently • But then the coding efficiency goes down • Improve compression efficiency using B frames
Bidirectionally predicted blocks allows effective prediction of uncovered background Bidirectional prediction can reduce noise (if good predictions available both past and future) B pictures not used for prediction→ substantial reduction in bits (I:P:B 5:3:1) B Frames B pictures – forward, backward, & interpolatively motion compensated from previous/next I/P frames • Increases motion estimation complexity in 2 ways: • Search 2 frames • Search bigger window if anchor frame farther away
MPEG-2 Generic Coding Algorithms • Goal: digital video transmission in range 2-15 Mbps • Generic coding algorithms to support: • Digital storage media, existing TV (PAL, SECAM, NTSC), cable, direct broadcast satellite, HDTV, computer graphics, video games • Brief history: • July 1990: working group established • Nov 1991: Subjective tests on 32 proposals • March 1993: technical contents of main level frozen • Nov 1994: international standard (parts 1-3)
Main differences MPEG-1 and -2 • MPEG-2 aimed at higher bit rates • Can be used for larger picture formats • MPEG-2 has a wider range of bit rates • Tool kit approach allows use of different subsets of algorithms • MPEG-2 supports scalable coding • SNR scalable, spatially scalable • MPEG-2 supports interlacing • This permeates everything: motion compensation, DCTs, ZigZag ordering for variable length coding
Overview of MPEG-4 Visual • MPEG-4 Visual is meant to handle many types of data, including • Moving video (rectangular frames) • Video objects (arbitrary-shaped regions of moving video) • 2D and 3D mesh objects (representing deformable objects) • Animated human faces and bodies • Static texture (still images)
Video Objects • MPEG-4 moves away from traditional view of video as a sequence of rectangular frames • Instead, collection of video objects • A video object is a flexible entity that a user can access (seek, browse) and manipulate (cut, paste) • A video object (VO) is an arbitrarily-shaped area of scene that may exist for an arbitrary length of time • An instance of a VO at a particular time is called a video object plane (VOP) • Definition encompasses traditional view of rectangular frames too
Static Sprite Coding • Background may be coded as a static sprite • The sprite may be much larger than the visible area of the scene Source: http://mpeg.telecomitalialab.com/standards/mpeg-4/mpeg-4.htm
Global Motion Compensation • The encoder sends up to 4 global motion vectors (GMVs) for each VOP together with the location of each GMV in the VOP • For each pixel position, an individual MV is calculated by interpolating between the GMVs and the pixel position is motion compensated according to this interpolated vector • GMVs and interpolated vector • GMC compensating for rotation • GMC compensating for camera zoom
Global Motion Estimation between 2 images assuming 2d affine motion Compression example: error images before and after global motion compensation (Soccer sequence: global motion estimation between 1st and 10th frame)
Animated 2D mesh coding A 2-D mesh is made up of triangular patches Deformation or motion can be modelled by warping the triangles Surface texture may be compressed as static texture Mesh and texture information might both be transmitted for key frames No texture transmitted for intermediate frames Mesh parameters transmitted Decoder animates mesh Coding Synthetic Visual Scenes
Motion Vectors for Meshes • A mesh is warped by transmitting vectors which displace the nodes • Mesh MVs are predictively coded • Texture residual can be coded with a very small number of bits • MPEG-4 also allows 3-D meshes • The vertices need not be in one plane • 3-D mesh samples the surface of a solid body
Shape-Adaptive DCT • The shape-adaptive DCT uses one-dimensional DCT, where the number of points in the transform matches the number of opaque values in each column (or row) Shift vertically 1-D column DCT Final coefficients More complex than normal 8x8 DCT, but improves coding efficiency for boundary MBs 1-D row DCT Shift horizontally
Two basic steps: Define basic shape of face or body model (typically carried out once at start of session) Send animation parameters to animate the model Encoder has choice of Generic facial definition parameters (FDPs) Custom FDPs for a specific face In similar way, a body object is rendered from a set of Body Definition Parameters (BDPs) and animated using BodyAnimation Parameters Face and Body Animation
Face Animation • The generic face can be modified by Facial Definition Parameters (FDPs) into a particular face • FDP decoder creates a neutral face: one which carries no expression • Change expressions by moving the vertices • Not necessary to transmit data for each vertex, instead use Facial Animation Parameters (FAPs) • Some combinations of vectors are common in expressions such as a smile, so these are coded as visemes • Can be used alone • Can be used as predictions for more accurate FAPs • Resulting data rate is small, e.g., 2-3 kbps
H.264 Brief history • The work started in VCEG (in 1998) as a parallel activity with the final version of H.263 • First test model produced in 1999. Many small steps over the next 4 years: • Many tweaks to the integer transform and to the variable block size • 1/8 pixel accurate MVs added in and then dropped • Many tweaks on the deblocking filter • Etc. etc. • Final version March 2003 • Final results: 2-fold improvement in compression (compared to H.263 and MPEG-2) & significantly better than MPEG-4 ASP