90 likes | 250 Views
Parallel Implementation of Fast Fourier Transform on a Multi-core System. Tao Liu Chi-Li Yu Nov. 29, 2007. Goal. Implement and optimize 2D FFT on FPGA platform. Evaluate multi-core architectures with various number of cores.
E N D
Parallel Implementation of Fast Fourier Transform on a Multi-core System Tao Liu Chi-Li Yu Nov. 29, 2007
Goal • Implement and optimize 2D FFT on FPGA platform. • Evaluate multi-core architectures with various number of cores. • Design memory structures suitable for the various multi-core architectures
Basic method and the problem • N-point 1D FFT • Generated by Xilinx LigiCORE. • Throughput rate: 1 sample per clock. • Up to 150MHz. • N*N Matrix • Stored in a dual-port SRAM constructed by Xilinx BRAM. • Total Latency: Row-wise + Colum-wise = N2+N2 =2N2 • Our target is to reduce the latency.
Quad-Core Architecture • 4 (N/2)-point 1D-FFTs: • Lower latency: Only¼latency (N2/2 clocks) for local 2D-FFT. • Overhead: 2 Radix-2 butterflies are required for preprocessing. • Extra latency: 2*(N/2)*(N/2) = N2/2 clocks • Total latency: N2 clocks (Single-core: 2N2)
8-Core Architecture • 8 (N/2)-point 1D-FFTs: • Latency : N2/4 • 16 banks of memory • 8 Radix-2 butterflies • Extra latency is reduced: N2/8 clocks • Total latency: 3N2/8
16-Core Architecture Radix-4 BTY • 16 (N/4)-point FFT • 16 banks of memory • 4 Radix-4 butterflies • Latency: N2/4 • Hardware resource of the FPGA is not enough!
Implementation • We implemented the architectures with Verilog Hardware Description Language. • Used Xilinx ISE Foundation to synthesize the designs. • The target FPGA platform is Digilent XUP V2 Pro.
Comparisons • *: 128x128 2D FFT. • Target FPGA : Xilinx XC2VP100, which contains 44096 slices.
Conclusion • Implemented 2D FFT on an FPGA • Evaluated various multi-core architecture • Designed and optimized memory structures for every multi-core architecture • Experimental results meet with theoretical predication