1 / 17

Contents

Hardware Optimized DCT-IDCT Implementation on Verilog HDL RAHUL SRIKUMAR ECE734:VLSI ARRAY STRUCTURES FOR DSP 05/10/13. Contents. Algorithm Implementations Performance Results Conclusion Future Work. Algorithm. 8 point DCT 2D DCT = C * X *Transpose( C )

kevin-long
Download Presentation

Contents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Optimized DCT-IDCT Implementation on Verilog HDL RAHUL SRIKUMARECE734:VLSI ARRAY STRUCTURES FOR DSP 05/10/13

  2. Contents • Algorithm • Implementations • Performance • Results • Conclusion • Future Work

  3. Algorithm • 8 point DCT • 2D DCT = C*X*Transpose(C) • C – coefficient matrix

  4. Algorithm(Cont’d) • 1D DCT = C*X • 2D DCT = Transpose(1D DCT)* C • 1D IDCT = Transpose(C) * 2D DCT • 2D IDCT =Transpose(1D IDCT) * Transpose(C)

  5. Implementations Part 1 Input word length – 8 bits 1D DCT internal word length – 11 bits 2D DCT output word length – 9 bits 2D IDCT output word length – 8 bits 4 implementations were evaluated Serial In (SI) – 1 pixel at a time 2 Parallel In (2PI) – 2 pixels at a time 4 Parallel In (4PI) – 4 pixels at a time 8 Parallel In (8PI) – 8 pixels at a time

  6. Implementations Part 2 • 8 registers of 8 bits each for coefficient storage. • very efficient when compared to 64 registers required for • 8*8 DCT/IDCT computation. • 2 RAMS each of 64 locations(8 bit wide) are used. • RAMS are enabled in the order • en_ram1_write->(en_ram1_read, en_ram2_write) • ->en_ram2_read

  7. Performance 1 • Serial In (1 pixel at a time) • Read 8 inputs = 8 cycles • Register 8 inputs + sign extension = 1 cycle • Add/Sub = 1 cycle • Absolute value = 1 cycle • Multiplication = 1 cycle • Final addition = 2 cycles • Total = 14 cycles

  8. Performance 2 • 2 Parallel In (2 pixel at a time) • Register 8 inputs + sign extension = 4 cycle • Add/Sub = 1 cycle • Absolute value = 1 cycle • Multiplication = 1 cycle • Final addition = 2 cycles • Total = 9 cycles

  9. Performance 3 • 4 Parallel In (4 pixel at a time) • Register 8 inputs + sign extension = 2 cycle • Add/Sub = 1 cycle • Absolute value = 1 cycle • Multiplication = 1 cycle • Final addition = 2 cycles • Total = 7 cycles

  10. Performance 4 • 8 Parallel In (8 pixel at a time) • Register 8 inputs + sign extension = 1 cycle • Add/Sub = 1 cycle • Absolute value = 1 cycle • Multiplication = 1 cycle • Final addition = 2 cycles • Total = 6 cycles

  11. Synthesis • Target Platform : ALTERA Cyclone IV GX FPGA • Tool Used : Quartus II • Language Used : Verilog

  12. Results 1 • Serial In has lowest synthesized combinational • area because of lowest number of wires needed to • feed in the data.

  13. Results 2 • Serial In has lowest synthesized area due to least • number of storage elements and counters required • to process the data.

  14. Results 3 • 8 parallel In takes 236 cycles in contrast to 246 for • serial in.

  15. Conclusion • Serial In occupies ~6% less area than 8 parallel In with a • performance degradation that is comparatively • lower(~4%).

  16. References • A Fast Hybrid Dct Architecture Supporting H.264, Vc-1, • Mpeg-2, Avs And Jpeg Codecs by Muhammad Martuza, Carl McCrosky and Khan Wahid at • 11TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, SIGNAL PROCESSING • AND ITS APPLICATIONS. • An Area Efficient Dct Architecture For Mpeg-2 Video Encoder by KyeounsooKim • and Jong-SeogKohin IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, VOL. 45, NO. 1, • FEBRUARY 1999. • Architecture Design of Shape-Adaptive Discrete Cosine Transform and Its Inverse for MPEG-4 • Video Coding byHui-Cheng Hsu et. Al inIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS • FOR VIDEO TECHNOLOGY, VOL. 18, NO. 3, MARCH 2008. • Integer DCT Based on Direct-Lifting of DCT-IDCT for Lossless-to-Lossy Image Coding by Taizo • Suzuki, Student Member, IEEE, and Masaaki Ikehara, Senior Member, IEEE in IEEE • TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010.

More Related