Digital Video Processing

Digital Image Processing – Fall 2008 Prof. Dmitry Goldgof Digital Video Processing Vasant Manohar Computer Science and Engineering University of South Florida • http://www.csee.usf.edu/~vmanohar vmanohar@cse.usf.edu

Outline • Basics of Video • Digital Video • MPEG • Summary

Basics of Video Static scene capture  Image Bring in motion  Video • Image sequence: A 3-D signal • 2 spatial dimensions & 1 time dimension • Continuous I (x, y, t) discrete I (m, n, tk)

Video Camera • Frame-by-frame capturing • CCD sensors (Charge-Coupled Devices) • 2-D array of solid-state sensors • Each sensor corresponds to a pixel • Stored in a buffer and sequentially read out • Widely used

Video Display • CRT (Cathode Ray Tube) • Large dynamic range • Bulky for large display • CRT physical depth has to be proportional to screen width • LCD Flat-panel display • Use electrical field to change the optical properties, thereby the brightness/color of liquid crystal • Generating the electrical field • By an array of transistors: active-matrix thin-film transistors “Active-matrix TFT display has a transistor located at each pixel, allowing display to be switched more frequently and with less current to control pixel luminance. Passive matrix LCD has a grid of conductors with pixels located at the grid intersections”

Composite vs. Component Video • Component video • Three separate signals for tri-stimulus color representation or luminance-chrominance representation • Pro: higher quality • Con: need high bandwidth and synchronization • Composite video • Multiplex into a single signal • Historical reason for transmitting color TV through monochrome channel • Pro: save bandwidth • Con: cross talk • S-video • Luminance signal + single multiplexed chrominance signal

Progressive vs. Interlaced Videos • Progressive • Every pixel on the screen is refreshed in order (monitors) or simultaneously (films) • Interlaced • Refreshed twice every frame; the little gun at the back of your CRT shoots all the correct phosphors on the even numbered rows of pixels first and then odd numbered rows • NTSC frame-rate of 29.97 means the screen is redrawn 59.94 times a second • In other words, 59.94 half-frames per second or 59.94 fields per second

Progressive vs. Interlaced Videos • How interlaced video could cause problems • Suppose you resize a 720 x 480 interlaced video to 576 x 384 (20% reduction) • How does resizing work? • takes a sample of the pixels from the original source and blends them together to create the new pixels • In case of interlaced video, you might end of blending scan lines of two completely different images!

Progressive vs. Interlaced Videos Observe distinct scan lines Image in full 720 x 480 resolution

Progressive vs. Interlaced Videos Image after being resized to 576x384 Some scan lines blended together!

Aspect Ratio • When you view pure NTSC video on your monitor, people look a little fatter than normal ? • TV video stored in 3:2 aspect ratio, while monitors store picture data in 4:3 aspect ratio • A lot of capture cards crop off the 16 pixels in the horizontal edges and capture in 704 x 480 or 352 x 480 • Aspect ratios in movies • 5:3  mostly used in animation movies • 16:9  academy ratio • 21:9  cinescope

Aspect Ratio • Converting widescreen pictures to 4:3 TV format • letterbox format (black bars above and below the picture) • losing parts of the picture • If we convert a 21:9 picture, we might lose a large part of the picture (blue – 16:9, red – 4:3)

DIGITAL VIDEO

Why Digital? • “Exactness” • Exact reproduction without degradation • Accurate duplication of processing result • Convenient & powerful computer-aided processing • Can perform rather sophisticated processing through hardware or software • Easy storage and transmission • 1 DVD can store a three-hour movie !!! • Transmission of high quality video through network in reasonable time

Digital Video Coding • The basic idea is to remove redundancy in video and encode it • Perceptual redundancy • The Human Visual System is less sensitive to color and high frequencies • Spatial redundancy • Pixels in a neighborhood have close luminance levels • Low frequency • How about temporal redundancy? • Differences between subsequent frames are very less. Shouldn’t we exploit this?

Hybrid Video Coding • “Hybrid” ~ combination of Spatial, Perceptual, & Temporal redundancy removal • Issues to be handled • Not all regions are easily inferable from previous frame • Occlusion ~ solved by backward prediction using future frames as reference • The decision of whether to use prediction or not is made adaptively • Drifting and error propagation • Solved by encoding reference regions or frames at constant intervals of time • Random access • Solved by encoding frame without prediction at constant intervals of time • Bit allocation • according to statistics • constant and variable bit-rate requirement MPEG combines all of these features !!!

MPEG • MPEG – Moving Pictures Experts Group • Coding of moving pictures and associated audio • Picture part • Can achieve compression ratio of about 50:1 through storing only the difference between successive frames • Even higher compression ratios possible • Audio part • Compression of audio data at ratios ranging from 5:1 to 10:1 • MP3 ~ “MPEG-1 audio Layer-3”

MPEG Generations

Bit Rate • Defined in two ways • bits per second (all inter-frame compression algorithms) • bits per frame (most intra-frame compression algorithms except DV and MJPEG) • What does this mean? • If you encode something in MPEG, specify it to be 1.5 Mbps; it doesn’t matter what the frame-rate is, it takes the same amount of space  lower frame-rate will look sharper but less smooth • If you do the same with a codec like Huffyuv or Intel Indeo, you will get the same image quality through all of them, but the smoothness and file sizes will change as frame-rate changes

Data Hierarchy • Sequence:entire video sequence • Group of Pictures:basic unit allowing for random access • Picture:primary coding unit with three color components and different picture formats progressive or interlaced scanning modes • Slice or Group of Blocks:basic unit for resynchronization refresh and error recovery (skipped if erroneous) • Macro-block:motion compensation unit • Block:transform and compression unit

MPEG-1 Compression Aspects • Lossless and Lossy compression are both used for a high compression rate • Down-sampled chrominance • Perceptual redundancy • Intra-frame compression • Spatial redundancy • Correlation/compression within a frame • Based on “baseline” JPEG compression standard • Inter-frame compression • Temporal redundancy • Correlation/compression between like frames • Audio compression • Three different layers (MP3)

Perceptual Redundancy • Here is an image represented with 8-bits per pixel

Perceptual Redundancy • The same image at 7-bits per pixel

Perceptual Redundancy • At 6-bits per pixel

Perceptual Redundancy • It is clear that we don’t all these bits! • Our previous example illustrated the eye’s sensitivity to luminance • We can build a perceptual model • Give more importance to what is perceivable to the Human Visual System • Usually this is a function of the spatial frequency

Video Coloring Scheme • Translate the RGB system into a YUV system • Human perception is less sensitive to chrominance than to brightness • Translate brightness into chrominance and then the resolution does not have to be as good  lower necessary bit-rate Coloring Scheme: JPEG Coloring Blocks Luminance Cr Cb Cg Normal Red Green Blue Translation formulas: Y = WrR + WbB + WgG Cr =Wr’ (R - Y) Cb =Wb’ (B - Y) Cg =Wg’ (G - Y)

Video Coloring Scheme • Chrominance means “the difference between one color and a reference color of the same brightness and chromaticity.” • Block: composed of six blocks (4:2:0 or 4:1:1 format) • Four blocks of yellow (luminance) • One block of Cb (blue chrominance) • One block of Cr (red chrominance) • Down-sampled chrominance • Y Cb Cr coordinate and four sub-sampling formats Ref: Y. Wang, J. Osterman, Y-Q Zhang: Digital Video Processing & Communications, Prentice-Hall, 2001

Intra-frame Compression • Intra-frame Coding: • Reduces spatial redundancy to reduce necessary transmission rate • Encoding I-blocks are practically identical to JPEG standard • Makes use of the DCT transform along with zigzag ordering • Lossy data compression

Fundamentals of JPEG Encoder DCT Quantizer Entropy coder Compressed image data IDCT Dequantizer Entropy decoder Decoder

Fundamentals of JPEG • JPEG works on 8×8 blocks • Extract 8×8 block of pixels • Convert to DCT domain • Quantize each coefficient • Different stepsize for each coefficient • Based on sensitivity of human visual system • Order coefficients in zig-zag order • Similar frequencies are grouped together • Run-length encode the quantized values and then use Huffman coding on what is left

Random Access and Inter-frame Compression Temporal Redundancy • Only perform repeated encoding of the parts of a picture frame that are rapidly changing • Do not repeatedly encode background elements and still elements • Random access capability • Prediction that does not depend upon the user accessing the first frame (skipping through movie scenes, arbitrary point pick-up)

3-D Motion -> 2-D Motion 3-D MV 2-D MV

Sample (2D) Motion Field Anchor Frame Target Frame Motion Field

2-D Motion Corresponding to Camera Motion Camera zoom Camera rotation around Z-axis (roll)

General Considerationsfor Motion Estimation • Two categories of approaches: • Feature based (more often used in object tracking, 3D reconstruction from 2D) • Intensity based (based on constant intensity assumption) (more often used for motion compensated prediction, required in video coding, frame interpolation) • Three important questions • How to represent the motion field? • What criteria to use to estimate motion parameters? • How to search motion parameters?

Motion Representation Pixel-based: One MV at each pixel, with some smoothness constraint between adjacent MVs. Global: Entire motion field is represented by a few global parameters Block-based: Entire frame is divided into blocks, and motion in each block is characterized by a few parameters. Also mesh-based (flow of corners, approximated inside) Region-based: Entire frame is divided into regions, each region corresponding to an object or sub-object with consistent motion, represented by a few parameters.

Examples target frame anchor frame Predicted target frame Motion field Half-pel Exhaustive Block Matching Algorithm (EBMA)

Examples Predicted target frame Three-level Hierarchical Block Matching Algorithm

Examples EBMA mesh-based method EBMA vs. Mesh-based Motion Estimation

Motion Compensated Prediction • Divide current frame, i, into disjoint 16×16 macroblocks • Search a window in previous frame, i-1, for closest match • Calculate the prediction error • For each of the four 8×8 blocks in the macroblock, perform DCT-based coding • Transmit motion vector + entropy coded prediction error (lossy coding)

Decoding with non-random access • Decoding and playing sub-frames located in section G, all frames before section G must be decoded as well • Synchronization algorithm issues • If section G is far along in the movie, this could take a considerable amount of time

Decoding with random access • When decoding any frame after an I frame (frame G in this example) • we only have to decode past frames until we reach an I-frame • saves time when skipping from frame to frame • I-frames are not predictively encoded • reduction in compression ratio • Depending on the concentration of I frames, there is a tradeoff: • More I frames  faster random access time • Less I frames  better compression ratio Introduce “I” frames, frames that are NOT predictively encoded by design Frames that are still encoded using a prediction algorithm are called “P” frames

MPEG-1 Video Coding • Most MPEG1 implementations use a large number of I frames to ensure fast access • Somewhat low compression ratio by itself • For predictive coding, P frames depend on only a small number of past frames • Using less past frames reduces the propagation error • To further enhance compression in an MPEG-1 file, introduce a third frame called the “B” frame  bi-directional frame • B frames are encoded using predictive coding of only two other frames: a past frame and a future frame • By looking at both the past and the future, helps reduce prediction error due to rapid changes from frame to frame (i.e. a fight scene or fast-action scene)

Predictive coding hierarchy:I, P and B frames • I frames (black) do not depend on any other frame and are encoded separately • Called “Anchor frame” • P frames (red) depend on the last P frame or I frame (whichever is closer) • Also called “Anchor frame” • B frames (blue) depend on two frames: the closest past P or I frame, and the closest future P or I frame • B frames are NOT used to predict other B frames, only P frames and I frames are used for predicting other frames

MPEG-1 Temporal Order of Compression • I frames are generated and compressed first • Have no frame dependence • P frames are generated and compressed second • Only depend upon the past I frame values • B frames are generated and compressed last • Depend on surrounding frames • Forward prediction needed

Adaptive Predictive Coding inMPEG-1 • Coding each block in P-frame • Predictive block using previous I/P frame as reference • Intra-block ~ encode without prediction • use this if prediction costs more bits than non-prediction • good for occluded area • can also avoid error propagation • Coding each block in B-frame • Intra-block ~ encode without prediction • Predictive block • use previous I/P frame as reference (forward prediction) • or use future I/P frame as reference (backward prediction) • or use both for prediction

Codec Adjustments • For smoothing out bit rate • A few applications prefer approx. constant bit rate video stream (CBR) • e.g., prescribe number of bits per second • very-short-term bit-rate variations can be smoothed by a buffer • variations cannot be too large on longer term, else buffer overflow • For reducing bit rate by exploiting Human Vision System (HVS) temporal properties • Noise/distortion in a video frame would not be very much visible when there is a sharp temporal transition (scene change) • can compress a few frames right after scene change with less bits • Changing the frame types • I I I I I I … lowest compression ratio (like MJPEG) • I P P … P I P P … moderate compression ratio • I B B P B B P B B I … highest compression ratio

MPEG Library • The MPEG Library is a C library for decoding MPEG-1 video streams and dithering them to a variety of color schemes. • Most of the code in the library comes directly from an old version of the Berkeley MPEG player (mpeg_play) • The Library can be downloaded from http://starship.python.net/~gward/mpeglib/mpeg_lib-1.3.1.tar.gz • It works good on all modern Unix and Unix-like platforms with an ANSI C compiler. I have tested it on “grad”. NOTE - This is not the best library available. But it works good for MPEG-1 and it is fairly easy to use. If you are inquisitive, you should check MPEG Software Simulation Groupat http://www.mpeg.org/MPEG/MSSG/where you can find a free MPEG-2 video coder/decoder.

Digital Video Processing

Digital Video Processing

Presentation Transcript

Digital Video

Digital Video

Video in Processing

Video Processing

Digital Video

Digital Video

Video Processing

Digital Video

Video Stabilization Processing

Video Processing EN292

Digital video

Video Processing

Digital Video

Digital Video

Digital Video

Video Fundamentals Signal Processing for Digital TV

Digital Video

Digital Video

Digital Video

Digital Video

Digital Video

Video Signal Processing