210 likes | 431 Views
Video Compression using SAD Algorithm. Introduction. Spartan 3 FPGA can accommodate only one SAD unit The entire SAD algorithm to compress the video has 25 SAD units 32 muxes (3/1,2/1,9/1,17/1,25/1 all in different numbers) counters and several other delay units
E N D
Video Compression using SAD Algorithm
Introduction • Spartan 3FPGA can accommodate only one SAD unit • The entire SAD algorithm to compress the video has • 25 SAD units • 32 muxes (3/1,2/1,9/1,17/1,25/1 all in different numbers) • counters and • several other delay units • Hence the software implementation of this project is reduced to the following block diagram
I & P sub-images The sub images are 8 * 32 pixels long This can be divided into 4 blocks of each 8*8 pixels
The pixels are read column wise from left to right Each column has 8 gray scale values which are read as groups Each group is an 8*8 pixels block => 64 values per block are read The I sub image is the reference sub image The P sub image is the predicted sub image The formation of I and P sub images is shown below
I sub-image Block Matching algorithm Estimated frame Error frame = original frame - estimated frame DCT Q Inv Q IDCT Reconstructed difference P sub- image = Estimated frame + reconstructed difference Image Segmented 2D-DCT Q Inv Q 2D-IDCT I sub-image
SAD Unit The sum of absolute difference (SAD) between 2 frames is calculated by taking the corresponding difference between their gray scale values and then summing all the differences
Consider 2 operands A,B on which SAD is being performed • SAD is calculated in 6 steps: • Determining the smallest operand of A,B • Inverting the smallest value • Passing both the operands to an adder tree • Adding the correction term • Carry Save adder implementation ( 33 => 2) • Carry Lookahead adder implementation (2=>1)
Steps – 1, 2 & 3 Fig sad3 The first 3 steps of SAD are performed by
First step is to find out whether A is smaller or B If B is smaller than A, A - B is computed else B – A is computed Subtraction is equivalent to 2’s complement hence A - B = A + B This addition is performed by the adder tree Then they carry generated by the sum is calculated The outputs of the adder tree are: A_out = A & B_out = B (for carry = 1) A_out = A & B_out = B (for carry = 0)
Step 4 The operands A,B are 8 bits long Input module is elaborated in fig sad3 Correction term = 8 is added along with the sad3 outputs All the 17 inputs are given to a Carry Save adder whose out put is 1 sum and 1 carry
Step 5 - Carry Save Adder A general adder will add the carry generated at each stage to the next stage inputs and this increases the delay A CSA only sums the input values without the carry of the previous stage After all the inputs are summed, a single sum and carry are generated Here we use a tree of CSA’s to add all the 16 inputs
The 2 outputs – sum, carry are sent to a carry lookahead adder which reduce the 2 terms into one SAD output Tree of CSA’s The resultant terms after the adder tree is
Step 6 – Carry Lookahead Adder Steps from 4 to 6 can be summarized by the following figure Here, the 2->1 reduction block is the CLA
SAD output When one block of data is sent to the SAD unit, 64 values are scanned and the resultant output from one SAD unit is 22 bits
9/1 mux All the 22 bits from each SAD unit are given to the next stage that is a 9/1 multiplexer First, as shown above, all the comparisons are 25 as per the requirement For maximum compression, we emit two blocks of data after comparison i.e. 25-8-8 = 9 Hence, a 9/1 mux is needed which selects the least SAD unit and the value is 22 bits long
Select counter The select mux selects bits from the SAD output and it compares two corresponding values The least value of the 2 is selected and stored in the register and all other comparisons are done using this least value Then the control signal is sent to the memory and a counter is used to keep track of what happens The final output – disparity vector of the counter is a 5 bit number which represents the SAD unit number (with the least values i.e. maximum compression)
References FPGA-based Architecture for Real-Time IP Video and Image Compression - D. Maroulis, N. Sgouros, D. Chaikalis Hardware bidirectional real time motion estimator on a Xilinx Virtex II Pro FPGA - Rashid Iqbal A Sum of Absolute Differences Implementation in FPGA Hardware - Stephan Wong, Stamatis Vassiliadis, and Sorin Cotofana The Sum-Absolute-Difference Motion Estimation Accelerator – S. Vassiliadis, E.A. Hakkennes, J.S.S.M. Wong, G.G. Pechanek Sum of Absolute Differences – a journal by Motorola Performance Analysis of H.263 Video Encoder for VRAM – Thinh PQ Nguyen Evaluation of the Effect of Saturation Arithmetic and other Techniques on Sum of Absolute Differences (SAD) Computation in H.264 – Venkata Suman Sanikommu De-interlacing of Video Data – Fazzini, Paolo, Guiseppe Algorithms and VLSI Architectures for MPEG-4 Motion Estimation - Shahrukh Agha and Vincent M Dwyer Low-cost Temporal Interpolation of Video Frames – Emilio Antunez A System for Providing the Absolute Differences of Absolute Values – Mennemeier Larry, Peleg Alexander, Gottlieb Koby Measuring the Effectiveness of Image/Video Processing for Stabilizing a Video Image using a Commercial Media Player – Thomas E. Wett http://www.pinnaclesys.com/PublicSite/us/Home/ VHDL programming by example – Douglas L. Perry A VHDL Primer – Jayaram Bhaskar