400 likes | 587 Views
Topic for lecture 2. Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): 2.16Mbit/image x 30 image/s = 64.8Mbps Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit
E N D
Topic for lecture 2 • Topic: video compression • The ultimate compression task? • Color image (300 x 300 x 24bit): • 2.16Mbit/image x 30 image/s = 64.8Mbps • Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit • 56.6K modem => Raw download time (excl. sound and overhead) ~ 1717 hours or ~ 72 days!!!
Agenda for lecture 2 • What makes video compression possible? • Implementations of motion compensation • Block matching • The YCbCr color representation • MPEG
Video compression • A sequence of images that needs to be compressed: storage and/or transmission • Ignore audio as images >> audio • Straight forward methods • Motion JPEG • 3D DCT
Temporal redundancy • Less than 10% of the pixels changes more than 1% between frames • Temporal redundancy or interframe correlation • Temporal redundancy > spatial redundancy • Origin: slow camera- and object movements
Motion compensated coding • Second generation of temporal compression method • More efficient (especially with rapid changes) but also more complex: • Ok since the cost of computer power is decreasing faster than the cost of bandwidth • Basic idea: only difference between two images are the moving objects (draw) • Estimate the motion and simply code this information • From prediction and the initial frame we can encode/decode all other frames
Practical issues • Due to noise, camera movements, light changes etc. => the object and background changes => • Calculate the predicted error (difference) and code this • Very hard to track and describe a general object (contour and texture) instead a block of pixels is used as ’object’ • The estimated motion is represented as pure translation: no rotation and scaling • This is justified since we have high frame rates and ’slow’ changes • Denoted the displacement vector or motion vector
Procedure for motion compensated coding • Image sequence => image => blocks of pixels • Step 1: Motion analysis: • Estimate the motion vector of the current block, i.e. the position of the block in the previous image(s) • Step 2: Prediction and differentiation • Predict how the block found in the previous image(s) will look like in the current image • Subtract the predicted block from the current block => difference • Step 3: Entropy encoding of the difference and motion vector • Encoded difference and motion vector << raw image => video compression • Step 3 we know
Motion analysis and prediction • In general we seek the trajectory of a block so we can predict its current position e.g. using weights • In praxis this is too complicated and instead a 0th order predictor is applied: • Predicted block(x,y,t) = block(a,b,t-1) • MPEG uses two 0th order predictors • The only unknown issue: step 1: how do we find the block in the previous frame that best matches the block in the current frame? • Three methods: • Block matching (by far the most applied method) • Pel-recursion (block = 1 pixel) • Optical flow (block = 1 pixel)
Block matching (1) • Principle • The displacement of the pixels in a block are assumed to have the same motion vector • Search window • Maximum from frame rate and context • Usually a square region • Usually p=q => square block • The smaller the block size =>the better prediction, but moreoverhead (motion vectors) • Usually block size = 16 x 16
Block matching (2) • Overlapping blocks improve reconstructed image quality but decrease the bit-rate • Usually non-overlapping blocks are applies • Block matching via a similarity measure: • Sum of squared differences (SSD): S(u,v) = (u-v)^2 • Mean absolute differences (MAD): S(u,v) = |u-v|
Searching strategies • Full search: • Finds global minimum but requires heavy processing! • Only one minimum in the search region => A less computational demanding search strategy • Accept a local minimum => • Larger difference but less processing • Searching strategies with one (local) minimum: • Coarse-fine three-step search • 2D logarithmic search • Conjugate direction search • Etc.
Coarse-fine three-step search • Step 1) Test 9 points within a fixed pattern • Step 2+3) Centre the pattern around the best match and change the distance within the pattern
YCbCr color representation • A camera captures color in RGB format (show) • We would like a representation where the intensity and color is separated: • So we can transmit and decode both a color and gray-scale signal • [R,G,B]: [50,50,50] same color as [100,100,100] • HSI (hue-saturation-intensity) • HSI is complex to calculate so we seek a more simple rep. • YUV-representation is a simple approximation: • Y = Luminance (intensity) = 0.299 R + 0.587 G + 0.114 B • The non-uniform weighting comes from the HVS • U = B – intensity = ”pure” blue color = 0.492 (B - Y) • V = R – intensity = ”pure” red color = 0.877 (R - Y) • Rough approximation but very simple to compute
1 1 1 2 2 2 3 3 3 4 4 4 YCbCr color representation (3) • The HVS is more sensitive to intensity (Y) than to color (Cb and Cr) so more bits can be used to represent the intensity • Formats: 4:4:4 (24 bits) 4:2:2 (16 bits) 4:2:0 (12 bits) = Y sample = Cb and Cr sample
MPEG • MPEG = Moving pictures experts group • International standard for compression of video (image, sound, and system info.), due to grows in the digital media (e.g. CD-rom, DVD) market. Both transmission and storage • MPEG-1: 1991 • MPEG-2: 1994 • MPEG-2 is MPEG-1 compatible, hence only MPEG-2 used today • MPEG is NOT an algorithm but rather a framework with several algorithms and MANY user-settings. • Fixed protocol, hence fixed decoders (encoder not specified! ) • Asymmetrical codec ~ 100:1 ( JPEG ~1:1 ) • MPEG is a lossy compression algorithm
MPEG-1 • MPEG-2 is an ”add-on” to MPEG-1 • Typical bit rate for MPEG-1 = 1.5Mbps • Meaning that an MPEG-1 decoder can decode and show real-time video that has been compressed to 1.5Mbps. MPEG: Trade off between video quality and bandwidth • Allows resolutions up to 4095 x 4095 at 60Hz • Most used is the CPB (constrained parameter bit steam) • Fixed resolutions and frame rates => HW implementations • Max. resolution = 768 x 576 at 30Hz • Max. bit rate = 1.856Mbps
MPEG-1 compression rate • BT.601 (digital TV-signal): • 704 x 576 x 24bit x 25Hz = 243Mbps • Compression factor: 243Mbps / 1.5Mbps = 162 • JPEG = 10-20 • YCrCb 4:2:0 format: 12 bit per pixel • Basic operation: down-scale to SIF (source input format) • Fixed resolution => HW solutions • 360 x 288 (ignore lines and/or interpolate) • 360 x 288 x 12 x 25Hz = 30.4Mbps => comp. factor = 20 • But can be higher or lower • In general: Fewer input data => better image quality (for fixed bit rate)
MPEG-1 principle (1) • Full-motion-compensated DCT and difference coding • Frames: 1,2,3,4,5,6,7,8,9, … • 1: (DCT-JPEG) • 2,3,4,5,6,7,8,9, …: difference coding • The difference is DCT coded and quantized => loosy compression • Problems? • Error propagation • No random access
MPEG-1 principle (2) • I-picture: intra-coded • Similar to JPEG • P-picture: predictive coded via forward prediction • B-picture: predictive coded via: • forward-, backward-, or bi-directional prediction • Errors in I and P are limited to max one GOP (group of pixels) • Errors in B are limited to one picture • High N and M => good coding but error propagation. • Usually: 13<N<16 and 0<M<4 • Recommended: I each ½ sec. and whenever scene changes • Coding order vs. visualisation order
Entire sequence 16 8 8 8 Cb Cr 8 8 8 16 Y Type: I,P,B MB = Macro Block 4:2:0-format 6 Blocks
Coding one Block (8x8) • Similar to JPEG except for adaptive quantization • DCT, quantization, zig-zag scan, entropy coding • Adaptive quantization controls the quality/amount of data • Intra vs. Inter coding: • I-blocks: Intra • P,B-blocks: Depending on DIFF: 0, motion vectors, Inter, Intra.
Coding one Block (8x8) • Encoding • Decoding
What to remember • Video compression is done by removing the temporal redundancy • Principle: (at block level) • Step 1: Motion analysis => motion vector • Step 2: Calculate the error/difference (subtraction) • Step 3: Entropy encoding of motion vector and difference • Motion analysis: • Pel-recursion • Optical flow • Block matching (the currently applied method) • Block matching • Block of pixels (16 x 16) • Similarity measure • Search region • Different search strategies to avoid the full search
What to remember • Video compression is done by removing the temporal redundancy • Principle: (at (macro)block level) • Step 1: Motion analysis (block matching) => motion vector • Step 2: Calculate the error/difference (subtraction) • Step 3: ’JPEG’-coding (DCT, quantization and entropy encoding) • MPEG-1: • Bit rate ~1.5Mbps • Asymmetrical codec ~ 100:1 ( JPEG ~1:1 ) • Compression rate < 400 (down scaling + YCbCr 4:2:0 => ~20) • Coding-style: I B B P B B P B B I • Questions? • Presentations: email me tbm@cvmt.dk • The end
Pel-recursion (1) • The block consists of only one pixel (= pel) • Problem formulation: • Displaced frame difference function: • DFD(x,y,dx,dy) = i(x,y,t) – i(x-dx,y-dy,t-1) • Find (dx,dy) which minimises DFD^2 => most similar pixel => best displacement vector • Solution: • Setting the partial derivatives = 0 • Non-linear programming problem: • Iterative algorithm • Steepest decent method • Newton-Raphson’s method • others
Pel-recursion (2) • Algorithm: • Find the motion vector (dx,dy) for the first pixel • The motion vectors are correlated => • Use ’old’ (dx,dy) as initial guess for the iterative algorithm => recursion
Optical flow • The block consists of only one pixel • Similar to Pel-recursive but calculated in a different manner
Comparing the 3 types of motion analysis • The three: pel-recursion, optical flow and block matching • Optical flow and pel-recursion calculated one motion vector for each pixel => • More precise => predicted block and current block are more similar => smaller difference => more compact coding of the difference. • More overhead as more motion vectors are to be coded • More complex to calculate • Pixel methods avoid the block artefacts of block matching • Block matching is (at present) more suitable • Used in all coding standards
Temporal methods • Two methods which exploit both the spatial and temporal redundancies • Frame replenishment • Motion compensation • Both utilise prediction => short summery
Frame replenishment (1) • Exploit the temporal redundancy • First generation of temporal compression method • If: value changed significantly: | i(x,y,t) – i(x,y,t-1) | > TH • Then: code value and position: i(x,y,t) x,y • Else: code nothing => re-use i(x,y,t-1) • Enhancements: • Send differences instead of values • Remove noise from the images prior to processing
Frame replenishment (2) • A fixed bit rate of 1Mbps means that the decoder can only decode and play-back real-time video compressed to 1Mbps • Many changes between two images => many pixels to be coded. • To achieve the same bit rate => TH is higher => only large changes are coded => poorer reconstruction aka. the dirty window effect
2D logarithmic search • Test 5 points within a fixed pattern • Centre the pattern around the best match • When best match is in the centre or on the border: reduce distance in pattern
Conjugate direction search • Step 1: Test 3 vertical points next to each other • Step 2: Move to minimum point • Continue step 1 and 2 until a minimum is found. Then repeat the process in the vertical direction
Y Cb Cr 0.257 0.504 0.098 -0.148 -0.291 0.439 0.439 -0.368 -0.071 R G B 16 128 128 0.299 0.587 0.114 -0.147 -0.289 0.436 0.615 -0.515 -0.100 R G B + = = Y U V YCbCr color representation (2) • YUV-representation can have negative values, so YUV-representation is scaled and shifted to avoid this => YCbCr-representation • Cb and Cr are denoted the chrominances • YCbCr is the representation utilised in image/video compression
dB Hz Audio in MPEG-1 • 16 bit sampled at: 16, 22.05, 24, 32, 44.1 and 48Kbps • Stereo at 44.1Kbps = 1.4Mbps • Compression based on psycho-acoustic redundancy: • Three methods: • Layer 1: Target rate = 384Kbps • Layer 2: Target rate = 256Kbps • Layer 3: Target rate = 128Kbps • Layer 3 is the most advanced and often applied • It has a nickname, which? dB Hz
MPEG-2 • Defined in 1994 • Developed for DTV but has lots of other applications • Based on MPEG-1 (backward compatible) • Bit rates: 1.5Mbps – 60Mbps. Target: 2-15Mbps (best: 4) • Lots of new features including: • Support for fields, support for 4:4:4 and 4:2:2 • Alternative zig-zag scan, better motion vectors • Scalability to allow any subset of a stream to be decoded and visualised, etc. • MPEG-3: Purpose: HDTV • Merged with MPEG-2 => no MPEG-3 standard
MPEG-4 • Both for real video and synthetic video • Very low bit rates < 64Kbps => efficient coding • Content based coding: code the objects • Shape, texture and sprite (background objects) • Interactivity • Popular coding standards: