150 likes | 587 Views
MOTION ESTIMATION IMPLEMENTATION IN VERILOG ECE 734 PROJECT SHREYAS SRIVASTAVA . OBJECTIVE AND OUTCOMES . Develop a block based motion estimation implementation Design criteria: Should be fast enough to be capable of real time HD video with “reasonable power consumption”
E N D
MOTION ESTIMATION IMPLEMENTATION IN VERILOG ECE 734 PROJECT SHREYAS SRIVASTAVA
OBJECTIVE AND OUTCOMES • Develop a block based motion estimation implementation • Design criteria: Should be fast enough to be capable of real time HD video with “reasonable power consumption” • Outcome : Developed and tested a Verilog implementation[1] based on H.264 block based motion estimation capable of encoding 720p HD video.
PROBLEM STATEMENT • For a given block in the current frame find the best possible match among candidate blocks in the reference frame • Matching criteria (SAD) :
H.264 BASED MOTION ESTIMATION Variable Macro block size motion estimation Each pair of reference and current 16X16 macro block generates 41 Motion Vectors
Steps Taken • Making a design estimate and requirements for supporting H.264 based VSBME. Should support HD 720p • Selecting a suitably fast architecture to perform the actual SAD computation • Write a verilog based design • Validation of design • Evaluating performance design
PERFORMANCE SPECIFICATION Desired throughput Rate for Design: 108,000 macro blocks for 720p HD video Table showing number of macro blocks to be processed per second[3]
POWER SPECIFICATION • Hard to get power figures for the motion estimation block separately. So reported below is the power for encoder block • Hence some ballpark figures indicating range of possible power values
SELECTED ARCHITECTURE OVERVIEW[1] • Motion estimation engine 16*16 array : performs 256 SAD s per cycle
ARCHITECTURE OVERVIEW CONTROLUNIT AGU 16 *16 PE ARRAY SA SRAM SA0 41 MV’s Adder & Comp. SA SRAM SA1 16 4X4 SAD 41 min SADs SA SRAM SA 15
PERFORMANCE • Macro block and search data pipelining ensures no stall unless switching rows. • Cycles required to process 30 frames of HD 720p : (30frames )*(3600 macro blocks)* (1024 cycles/macroblock) +(30 *48*(720/16) stall cycles) =110656800 cycles • @ 30 fps Frequency required =110.65 MHz • Reported Frequency : 115 MHz(TSMC 40nm) • Reported Power: 405 mW
VALIDATION • Hierarchical validation : individually tested the processing element , processing array and control unit • Testbench with search area SRAM initialized and performed RTL simulation to ensure that data path computations are correct at each cycle
FUTURE WORK • Low power design could involve looking into faster searching algorithms like diamond searching and hexagon based searching. REFERENCES [1] Kim et al. A Fast VLSI Architecture for Full-Search Variable Block Size Motion Estimation in MPEG-4 AVC/H.264 [2] Jun-Fu Shen, Tu-Chih Wang, and Liang-Gee Chen; A Novel Low-Power Full Search Block-Matching Motion-Estimation Design for H.263+ IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 7, JULY 2001 [3]www.fvc.com/FVC/FVCWEB/files/Static%20Macroblocks.070307.ppt [4]Koziri et al. Novel Low-Power Motion Estimation Design for H.264