1 / 9

Parallel Implementation of Fast Fourier Transform on a Multi-core System

Parallel Implementation of Fast Fourier Transform on a Multi-core System. Tao Liu Chi-Li Yu Nov. 29, 2007. Goal. Implement and optimize 2D FFT on FPGA platform. Evaluate multi-core architectures with various number of cores.

Download Presentation

Parallel Implementation of Fast Fourier Transform on a Multi-core System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Implementation of Fast Fourier Transform on a Multi-core System Tao Liu Chi-Li Yu Nov. 29, 2007

  2. Goal • Implement and optimize 2D FFT on FPGA platform. • Evaluate multi-core architectures with various number of cores. • Design memory structures suitable for the various multi-core architectures

  3. Basic method and the problem • N-point 1D FFT • Generated by Xilinx LigiCORE. • Throughput rate: 1 sample per clock. • Up to 150MHz. • N*N Matrix • Stored in a dual-port SRAM constructed by Xilinx BRAM. • Total Latency: Row-wise + Colum-wise = N2+N2 =2N2 • Our target is to reduce the latency.

  4. Quad-Core Architecture • 4 (N/2)-point 1D-FFTs: • Lower latency: Only¼latency (N2/2 clocks) for local 2D-FFT. • Overhead: 2 Radix-2 butterflies are required for preprocessing. • Extra latency: 2*(N/2)*(N/2) = N2/2 clocks • Total latency: N2 clocks (Single-core: 2N2)

  5. 8-Core Architecture • 8 (N/2)-point 1D-FFTs: • Latency : N2/4 • 16 banks of memory • 8 Radix-2 butterflies • Extra latency is reduced: N2/8 clocks • Total latency: 3N2/8

  6. 16-Core Architecture Radix-4 BTY • 16 (N/4)-point FFT • 16 banks of memory • 4 Radix-4 butterflies • Latency: N2/4 • Hardware resource of the FPGA is not enough!

  7. Implementation • We implemented the architectures with Verilog Hardware Description Language. • Used Xilinx ISE Foundation to synthesize the designs. • The target FPGA platform is Digilent XUP V2 Pro.

  8. Comparisons • *: 128x128 2D FFT. • Target FPGA : Xilinx XC2VP100, which contains 44096 slices.

  9. Conclusion • Implemented 2D FFT on an FPGA • Evaluated various multi-core architecture • Designed and optimized memory structures for every multi-core architecture • Experimental results meet with theoretical predication

More Related