610 likes | 756 Views
VIDEO COMPRESSION FUNDAMENTALS. Pamela C. Cosman. Exploit spatial redundancy within frames (like JPEG: transforming, quantizing, variable length coding) Exploit temporal redundancy between frames Only the sun has changed position between these 2 frames. Compressing Digital Video.
E N D
VIDEO COMPRESSION FUNDAMENTALS Pamela C. Cosman
Exploit spatial redundancy within frames (like JPEG: transforming, quantizing, variable length coding) Exploit temporal redundancy between frames Only the sun has changed position between these 2 frames Compressing Digital Video Current Frame Previous Frame
0 1 2 3 Simplest Temporal Coding - DPCM • Frame 0 (still image) • Difference frame 1 = Frame 1 – Frame 0 • Difference frame 2 = Frame 2 – Frame 1 • If no movement in the scene, all difference frames are 0. Can be greatly compressed! • If movement, can see it in the difference images
Difference Frames • Differences between two frames can be caused by • Camera motion: the outlines of background or stationary objects can be seen in the Diff Image • Object motion: the outlines of moving objects can be seen in the Diff Image • Illumination changes (sun rising, headlights, etc.) • Scene Cuts: Lots of stuff in the Diff Image • Noise
Difference Frames • If the only difference between two frames is noise (nothing moved), then you won’t recognize anything in the Difference Image • But, if you can see something in the Diff Image and recognize it, there’s still correlation in the difference image • Goal: remove the correlation by compensating for the motion
Frame n Frame n+1 Types of Motion • Translation: simple movement of typically rigid objects • Camera pans vs. movement of objects Frame n Frame n+1 (Rotation) Frame n+2 (Zoom) • Rotation: spinning about an axis • Camera versus object rotation • Zooms –in/out • Camera zoom vs. object zoom (movement in/out)
Describing Motion • Translational • Move (object) from (x,y) to (x+dx,y+dy) • Rotational • Rotate (object) by (r rads) (counter/clockwise) • Zoom • Move (in/out) from (object) to increase its size by (t times) Which is easiest? Which are we most likely to encounter?
Motion Estimation • Determining parameters for the motion descriptions • For some portion of the frame, estimate its movement between 2 frames- the current frame and the reference frame • What is some portion? • Individual pixels (all of them)? • Lines/edges (have to find them first) • Objects (must define them) • Uniform regions (just chop up the frame)
General Idea • For a region PC in the current frame, find a region PR in the search window in reference frame so that Error(PR,PC) is minimized • Issues: Error measures, search techniques, choice of search window, choice of reference frame, choice of region PC Current Portion Reference Search of Frame Frame interest window PC
Block-based Motion Estimation • PC is a block of pixels (in the current frame) • The search window is a rectangular segment (in the reference frame) T=1 (reference) T=2 (current)
Motion Vectors • A motion vector (MV) describes the offset between the location of the block being coded (in the current frame) and the location of the best-match block in the reference frame T=1 (reference) T=2 (current)
4 9 8 7 6 1 1 5 9 3 6 7 8 2 5 12 12 16 11 14 13 11 10 10 16 13 15 14 15 Motion Compensation The blocks being predicted are on a grid The blocks used for prediction are NOT 2 3 4
Motion Vector Search • 1. Mean squared error • Select a block in the reference frame to minimize Σ(b(Bref)-b(Bcurr))2 • 2. Mean abs. error • Select block to minimize Σ|b(Bref)-b(Bcurr)| • Given error measure, how to efficiently determine best-match block in search window? • Full search: best results, most computation • Logarithmic search – heuristic, faster • Hierarchical motion estimation
Motion Vector Search Logarithmic Search: First examine positions marked 1. Choose best of these (lowest error measure) and examine positions marked 2 surrounding it Choose the best of these, and examine the positions marked 3 Final result = best of these • Full search: Evaluate every position in the search window
Hierarchical Motion Estimation • Use an averaging filter on the image, then downsample by a factor of 2 • Conduct a search on the downsampled image (only ¼ of the size) • Given the results of the search on the downsampled image, return to the full resolution image and refine the search there
Motion Compensation • The standards do not specify HOW the encoder will find the motion vectors (MVs) • The encoder can use exhaustive/fast search, MSE /MAE/other error metric, etc. • The standard DOES specify • The allowable syntax for specifying the MVs • What the decoder will do with them • What the decoder does is to grab the indicated block from reference frame, and glue it in place
The video compression standards define syntax and semantics for the bit stream between encoder and decoder Encoder is not specified by MPEG except that it produces a compliant bit stream Compliant decoder must interpret all legal MPEG bit streams This allows future encoders of better performance to remain compatible with existing decoders. Also allows for commercially secret encoders to be compatible with standard decoders bit stream ENCODER DECODER not this not this Standard defines this Very secret Encoder Standard specifies bit stream Today’s Ho-Hum Encoder Today’s Decoder Tomorrow’s Nifty Encoder Today’s decoder still works!
Motion Compensation Example Frame n-1 Frame n MOTION COMPENSATED Frame n
Objects versus Macroblocks • Real moving objects will not coincide with boundaries of macroblocks • If encoder sends MV=(MotX,MotY), object well coded, but background poorly coded • If encoder sends MV=(0,0), background well coded, but moving object poorly coded • Either approach is valid background Prediction error Background well encoded (no motion vector) moving object Moving object well encoded with motion vector Prediction error
Motion Compensation • This glued together frame is called the motion compensated frame • The encoder can also form the difference between the motion compensated frame and the actual frame. • This is called the motion compensated difference frame • This difference frame formed using MC should have less correlation between pixels than the difference frame formed without using MC
Suppose we are doing lossless coding Encoder has sequence of frames: …, F(n-2), F(n-1) Next: encode F(n) Past frames have been losslessly encoded, so the decoder knows F(n-1) perfectly already Encoder sends the motion vectorsfor frame F(n) relative to frame F(n-1), to form motion compensated frame M(n) Encoder knows M(n), Decoder knows M(n) Motion Compensated Difference Frames
Motion Compensation Example F(n-1) F(n) M(n) MOTION COMPENSATED Frame
Encoding Difference Frames • Encoder forms motion compensated diff frame: MCD(n) = F(n) – M(n) • Encoder losslessly encodes MCD(n) • Decoder can then do F(n) = MCD(n) + M(n) → knows F(n) exactly • With no motion compensation encoder could do frame diff:FD(n) = F(n) – F(n-1) • Encoder losslessly encodes FD(n) • Decoder can then do F(n) = FD(n) + F(n-1) → knows F(n) exactly • If successive frames are very similar: • fewer bits to send Motion Vectors + MCD(n) instead of FD(n) • fewer bits to send FD(n) instead of F(n)
Reference Frame F(n-1) Original Frame F(n) Difference Image FD(n)=F(n)-F(n-1) Motion compensated frame M(n) Motion compensated difference image MCD(n) =F(n) – M(n) Motion compensated difference frames • Decoder knows F(n-1) and, once you send the motion vectors, it knows M(n) Send FD(n) Send Motion Vectors Send MCD(n)
But we are NOT doing lossless coding Encoder has sequence of frames: …, F(n-2), F(n-1) Next: encode F(n) Past frames have been lossy encoded, so the decoder has versions …, G(n-2), G(n-1) Encoder knows …, G(n-2), G(n-1) also Encoder sends the motion vectors for frame F(n) relative to frame G(n-1), to form motion compensated frame M(n) Motion Compensated Difference Frames
Encoding Difference Frames • Encoder forms motion compensated difference frame: MCD(n) = F(n) – M(n) • Encoder lossy encodes MCD(n) • Call the decoder version MCD*(n) • If the decoder received MCD(n) exactly, could do: F(n) = MCD(n) + M(n) • But with MCD*(n), decoder can do G(n) = MCD*(n) + M(n) → knows F(n) approximately
Motion estimation philosophy • Goal of motion estimation is NOT to provide a careful analysis of the actual motion • Goal is to achieve a given quality of representation of the video while globally minimizing the bit rate required to send • The motion information • The prediction error information • Most of the time, for a given representation quality • fewer bits to send MV+MCD(n) instead of sending FD(n) • fewer bits to send FD(n) instead of sending F(n) itself.
Motion Compensation for Chrominance • Luminance is highly correlated, more so than chrominance • The “best” motion vectors are available by searching in the luminance plane • Motion vectors for chrominance are not computed separately, simply scaled as needed
Motion Estimation/Compensation Summary • At the encoder: • For each block in the frame being coded, examine the search window(s) in the reference frame to find the best match block (do this for luminance only) • Form the MC difference image = original image minus motion compensated image • Scale the motion vectors for the chrominance, form the motion compensated chrominance frames, and form chrominance difference image
Motion Estimation/Compensation Summary • At the decoder: • Decode the reference frames (Y,Cr,Cb) • For each block in a temporally coded Y frame, use the motion vector to select a block from the reference frame and glue it in place • Add the Y difference image • For each block in temporally coded Cr,Cb frames, first scale the motion vector, then do the previous 2 steps with Cr and Cb data
Temporal Location of Reference • The reference frame need not occur before the temporally coded frames which use it • Why? Scene changes, allow better matches
Flavors of Motion Estimation • 1. Forward predicted blocks: the best-match block occurs in the reference frame before the block’s frame • 2. Backward predicted blocks: the best-match block occurs in the reference frame after the block’s frame • 3. Interpolatively predicted blocks: the best-match block is the average of the best-match blocks from reference frames before & after • The motion compensation direction can be selected independently for each block in a frame.
MPEG Frame Types • Intra (I) pictures: coded by themselves, as still images. No temporal coding. No motion vectors.
MPEG Frame Types • Forward Motion Compensated predicted (P) pictures – forward motion compensated from the previous I or P frame
MPEG Frame Types • Motion Compensated interpolated (B) pictures – forward, backward, and interpolatively motion compensated from previous/next I/P frames
Motion Vector Coding • How are the motion vectors actually encoded for transmission to the decoder? • Start by taking the difference between the current motion vector and the most recent previous one of the same type (forward/backward/interpolative) • Encode the difference using variable length coding • Horizontal and vertical components coded separately
MPEG Frame Structure Terminology • A block contains 8x8 pixels • The DCT unit • A macroblock (MB) contains 4 blocks from the luminance, plus the corresponding chrominance blocks • 4 blocks from each of Cr/Cb if 4:4:4 format • 2 blocks from each of Cr/Cb if 4:2:2 format • 1 block from each of Cr/Cb if 4:1:1 or 4:2:0 format • The motion compensation unit
MPEG Frame Structure Terminology • A slice is a collection of macroblocks, tracing in a raster scan from upper left to lower right • The resynchronization unit • A picture is a frame, either progressive (non-interlaced) or interlaced • The primary coding unit • A Group of Pictures (GOP) contains ≥ 1 frame. • The unit for random access into the sequence
MPEG GOP Structure • A Group of Pictures (GOP) may contain • All I pictures • I & P pictures only • I, P, & B Pictures • A common GOP format for 30 frames/sec: • I-picture spacing 15 frames (1/2 second) • P-picture spacing 3 frames (1/10 second)
Frame Ordering • Display order (encoder input order): • But consider coding dependencies: • Frame 2 (B) needs frame 4 (P) to be decoded first, etc. • So better transmit frame 4 before frame 2
Types of Coding Modes • What if the best-match block in the reference frame is a great match? • Then the motion vector is all you need to send • What if it is a terrible match? • Then don’t use the motion vector at all, just code the block by itself, with something like JPEG (called intra mode coding) • What if it is a so-so match? • Then you can send the MV, and also send the frame difference information for that macroblock
Macro Block Previous Current Frame Frame Motion Vector Coding Mode I (Inter-Coding) • Inter coding refers to coding with motion vectors
INTRA coding refers to coding without motion vectors The MB is coded all by itself, in a manner similar to JPEG Macro Block Previous Current Frame Frame Coding Mode II (Intra-Coding)
I-Picture Coding • Two possible coding modes for macroblocks in I-frames • Intra- code the 4 blocks with the current quantization parameters • Intra with modified quantization: scale the quantization matrix before coding this MB • All macroblocks in intra pictures are coded • Quantized DC coefficients are losslessly DPCM coded, then Huffman as in JPEG