310 likes | 514 Views
Interframe Coding. Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 200 8 Last updated 2008. 10. 12. Agenda . Interframe Coding Concept Block Matching Algorithm Fast Block Matching Algorithms Block Matching Algorithm Variations
E N D
Interframe Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2008 Last updated 2008. 10. 12
Agenda • Interframe Coding Concept • Block Matching Algorithm • Fast Block Matching Algorithms • Block Matching Algorithm Variations • Enhanced Motion Models • Implementation Cases
1. Interframe Coding • Motivation • Video has High Temporal Correlation between frames. • Var[ X(t+1) – X(t) ] << Var[ X(t+1) ] Two successive video frames DFD (displaced Frame Difference)
Motion Estimation and compensation • Motion estimation • Find the best parameters of current frame from reference frames • Motion compensation • Subtracts (Add) the predicted values from current frame (to DFD frame) Current frame MC Encode Residual MC Recon. Texture Info Reference frames ME Motion parameters Reference frames Recon.
Performance Criteria • Coding performance • Residual signal has low energy (variance measure) • Complexity • Computational and implementation complexity • Storage and Delay • Number of required frames • Side Information • Size and complexity of motion parameters • Error resilience • When data is partially lost. • Some factors are trade off • Coding perf. against complexity, storage, side info, error resilience.
2D Motion previous frame stationary background current frame x time t y moving object „Displacement vector“ shifted object Prediction for the luminance signal S(x,y,t) within the moving object:
X(t+1) X(t) Real Motion MV 2. Block Matching Algorithm • BMA(Block matching algorithm) • Segment frame into same rectangular Blocks • 2-D linear motion (mvx, mvy) per each block
Difference Measure • MSE • MAE and SAE • CCF (Cross Correlation Function)
Full Search Algorithm • “Full Search” does Not means the whole frame, but whole position in limited Search Window • Method • Raster order or Spiral order (Figure. 6.6)
Full Search Complexity • (2w+1) x (2w+1) points (for search window [-w, w]) • NxN size Block computation int SAE(uchar *f, uchar *g, int mvx, int mvy){ for ( x=0; x< N; x++){ for ( y=0; y< N; y++){ sae += ABS(*(f + (y+mvy)*width +(x+mvx), *(g + y*width+x)); • } • mvx_min = mvy_min = 0; • min = SAE(f, g, 0, 0); • for(mvy=-w, mvy<=w, mvy++) • for(mvx=-w, mvx<=w, mvx++){ • sae = SAE(pre, cur, mv, mv) • if(min >sae) • mvx_min = mvx, mvy_min = mvy, min = sae; • }
-w +w 0 3. Fast BMAs • Complexity Reduction Approaches • Reduce test points • Monotonic variation assumption • The closer to the optimal point, the smaller difference • Change the test-point order (more like first) • Binary Search than Linear Search • Benefit from Early Stop of block difference calculation • Reduce the computation at one point • Sub-sampled value • Note • Trade-off!
TSS (3-Step Search) • Step 0: Search center (0,0), n = w • Step 1: n = floor[ n / 2 ] • Step 2: Search 8 points and find the min values • Step 3: if n == 1 stop, o.w. Go to Step 1 • Properties • Logarithmic/Binary search (only 3 step when p = 8) • Search decreasing distance • w/2 => w/4 => w/8 . . . . until 1 • Complexity : O(log2w)
2D Logarithmic Search • Step 0: Search center (0,0) • Step 1: Search 4 points with s step size • Step 2: find min, if center S = S/2, ow. move center to the min locaiton • Step 3: if S = 1, go to step 4, else go to Step 1 • Step 4: search the 8 neighbors, and decide min. • Properties • Similar to TSS, but more accurate • Complexity ~ O(log2w) but not fixed loop count 3 4 5 5 5 1 4 2 5 5 3 5 5 5 1 1 2 2 2 1
Examples • TSS (Tree Step Search) • Logarithmic Search • Cross Search • One-at-a-time Search • Nearest Neighbors Search • From Other Source. • TSS (Three Step search) • TDL (Two Dim. Logarithmic) • CDS (Conjugate Direction Search) • CSA (Cross Search Algorithm) • OSA (Orthogonal Search Algorithm)
Fast BMA Performance • Complexity
Issues in Fast MC Algorithm • Local Minimum Error • Fast MC calculates only few of positions • Many cases are not “monotonic” curves, single hill. • Possibly can conclude with local minimum. • See Figure 6.15 1 1 1 2 3 2 3
Hierarchical MC • Reduced image • Sub-sampled, filtered • N levels with half resolution • Search top (N) level fully • reduced search window range (w/2N-1) • Search lower N-1 level • only 9(8?) neighbor positions only
Benefits of hierarchical search • Escape Local minimum • Complexity Reduction • e.g) Window = 16 full search (2 × 32 + 1)2 = 4225 operations HBMA with N =4, (2 × 4 + 1)^2 + 3 × 9 = 108 operations Sub-sampled signal Original signal
4. Variations of BMA: Multi-frame MC • Multiple Frame MC • “Forward pred” starts from H.261 • “backward, bidirectional” starts from MPEG-1 • “multiple reference (each MB takes its own ref picture) starts from H.264 forward forward backward bidirectional: average
4. Variations of BMA: Multi-frame MC • Multiple Frame distance • Search Range = frame difference x window • Since displacement = velocity x time • eg) w = 8, 64 points (1 frame diff), 256 points (2 frame diff) • Practice • search only [-w, w] of (mvx1, mvy1) for (mvx2, mvy2) -2w mvx2,mvy2 -w mvx1,mvy1 +w +2w t -1 t -2 t
MV at Boundary • Restriction on MV range • Should inside of reference pictures • In H.261/MPEG-1, MPEG-2, MPEG-4 • Unrestricted MV • Extrapolates (extends with same boundary pixel value) • In H263 Annex D,H.264 -w -w +w +w t -1 t Extrapolated t -1
Sub-pixel Motion Estimation • Note • Object cannot happens to move integer pixels • We have only integer pixel samples • Sub-pixel estimation • Get the fractional pel values in reference frame • Normally using linear interpolation • Half-pel/quarter-pel
5. Enhanced Motion Models • More Motion Estimation Model • Rigid 2D Translation (BMA) • + Transformation • Global Motion • + Illumination variation • + zoom-in/out • Object Model • + overlapping of objects • + 3D Rotation • + Non rigid objects (deformation) • Some are from computer vision area • But at present most tools are too complex for application to video coding area • Some are included in MPEG-4 Part 2’s Object Oriented Coding
Examples • Region based motion compensation • How to get/describe shape and motion • Global motion (picture warping) • Called Camera motion • Mesh-based Deformation
6. Implementation • Video Encoder and Decoder Complexity Profiling
SW Optimization • Algorithm level optimization : independent of CPU • Data structure design (most modern CPU, RISC) • Memory Cache optimization • Current blocks into cache • Loop unrolling (See Fig. 6.21) • Reduce the pointer operation and jump prediction (pipelining) • CPU-specifics Optimization • SIMD (Single Instruction with Multiple Data) • Packed Instruction (See Fig. 6.22) • TI DSP, Intel MMX etc • MIMD (MuParalell Processing Core) • VLIW (Very Long Instruction Word) of TI DSP • GPU • DMA utilization • Coprocessor Utilization • DCT, ME, Post/Pre Processing
SAE SAE SAE SAE Comparator HW Optimization • Criteria • Performance, cycle count, gate-count, data flow • Example #1: Full Search • Parallelization • M function block, then M Speed up Search Window Memory (DRAM/SRAM) Current MB (SRAM)
STEP1 (+/-4 Step 2 (+/-2) Step3 (+/-1) Step 4 (+/-1/2) • Example #2: Fast Search • TSS and Hierachical search (has fixed clock property) • Pipelining blocks for speed up Search Window Memory (DRAM/SRAM) Current MB (SRAM) t =1 block 1 t=2 block 2 block 1 t= 3 block 3 block 2 block 1 t=4 block4 block 3 block 2 block 1