80 likes | 90 Views
Using Vector Capabilities of GPUs to Accelerate FFT. Vasily Volkov and Brian Kazian CS 258 Spring 2008. Sun Niagara II Specs. 8 SPARC Cores @ 1.4 GHz (up to 8 threads each) 16K Instruction/8K Data Caches 4MB shared L2 Cache One FPU per core Four dual-channel FBDIMM Memory Controllers
E N D
Using Vector Capabilities of GPUs to Accelerate FFT Vasily Volkov and Brian Kazian CS 258 Spring 2008
Sun Niagara II Specs • 8 SPARC Cores @ 1.4 GHz (up to 8 threads each) • 16K Instruction/8K Data Caches • 4MB shared L2 Cache • One FPU per core • Four dual-channel FBDIMM Memory Controllers • Theoretical limit of 11 Gflops/s for the 8 FPU’s • Extremely large memory bandwidth (60 GB/s)
FFT On Niagara • Decided to install and benchmark with the FFTW library • Very similar in execution to CUFFT • Offers competitive performance on variety of platforms • Compiled on Niagara II with pthreads enabled • Uses double precision as opposed to G80’s single
Results • Found that the Hybrid gave best results • Tune thread count for problem size • Limited by the number of threads in comparison to CUDA • Issues with data alignment in cache • Not stellar performance out of the box with FFTW