Using Vector Capabilities of GPUs to Accelerate FFT

Using Vector Capabilities of GPUs to Accelerate FFT Vasily Volkov and Brian Kazian CS 258 Spring 2008

Sun Niagara II Specs • 8 SPARC Cores @ 1.4 GHz (up to 8 threads each) • 16K Instruction/8K Data Caches • 4MB shared L2 Cache • One FPU per core • Four dual-channel FBDIMM Memory Controllers • Theoretical limit of 11 Gflops/s for the 8 FPU’s • Extremely large memory bandwidth (60 GB/s)

FFT On Niagara • Decided to install and benchmark with the FFTW library • Very similar in execution to CUFFT • Offers competitive performance on variety of platforms • Compiled on Niagara II with pthreads enabled • Uses double precision as opposed to G80’s single

Single FFT Comparison

FFTW with Built-in Threading

Batched FFTW

Hybrid FFTW

Results • Found that the Hybrid gave best results • Tune thread count for problem size • Limited by the number of threads in comparison to CUDA • Issues with data alignment in cache • Not stellar performance out of the box with FFTW

Using Vector Capabilities of GPUs to Accelerate FFT

Using Vector Capabilities of GPUs to Accelerate FFT

Presentation Transcript

Using GPUs to Enable Highly Reliable Embedded Storage

Calculation of RI-MP2 Gradient Using Fermi GPUs

Using systems thinking to accelerate change

Scalable Clustering using Multiple GPUs

FFT Using External Storage.

Hardware Acceleration Using GPUs

Using Prediction to Accelerate Coherence Protocols

A Technique to Accelerate Vector Fitting Algorithm for Interconnect Simulations

Operational Weather Forecasting using GPUs

Region-Scale Evacuation Modeling using GPUs

Porting physical parametrizations to GPUs using compiler directives

Parallelization and Characterization of Pattern Matching using GPUs

Using Prediction to Accelerate Coherence Protocols

Programming GPUs using Directives

Using Gordon to Accelerate LHC Science

Implementing IFFT using FFT accelerator

Introduction To GPUs

Effects of as-built Mirrors - analysis using FFT -

Using Scientometrics to Accelerate Science Dr. Katy Börner

How to Accelerate your Website using Cloud CDN?

Scalable Clustering for Vision using GPUs

FFT Using External Storage.