250 likes | 415 Views
The FFT on a GPU. Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico.
E N D
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Overview • Introduction • Motivation, FFT review. • FFT Techniques • Exploitable FFT properties. • Implementation • Results • Performance, applications, conclusions. Graphics Hardware 2003
Motivation • The Fourier transform is a principal tool for digital image processing. • Filtering. • Correction. • Compression. • Classification. • Generation. • As such, should not our graphics hardware support such a tool? Graphics Hardware 2003
The Discrete Fourier Transform • Converts data in the spatial or temporal domain into frequencies the data comprise. Graphics Hardware 2003
DFT IDFT The Discrete Fourier Transform • 2D transform can be computed by applying the transform in one direction, then the other. Graphics Hardware 2003
The Fast Fourier Transform • Divide and Conquer Algorithm • Input sequence is divided into subsequences consisting of values from even and odd indices, respectively. Graphics Hardware 2003
Index Magic • Do not use recursion. • Use dynamic programming: iterate over entire array computing all values for each recursive depth together, like mergesort. • Indexing is non-obvious. • Unlike mergesort, recursive step does not divide array into contiguous chunks. • At any iteration, what partition does a given index belong to, and where can one find the applicable values of the sub-partitions? Graphics Hardware 2003
Index Magic • Common solution: rearrange data by reversing the bits of indices. • FFT can occur with contiguous partitions. • Requires an extra data copy. • Our solution, determine indexing in place. Note that the paper has a typo. Graphics Hardware 2003
Fourier Symmetry of Real Sequences • In general, the frequency spectra of even real functions contain imaginary values. • Captures magnitude and phase shift of sinusoids. • Brute force FFT doubles computation and storage costs. • But, Fourier transforms of real functions have symmetry. • Values at and are real (because they are conjugates with themselves). Graphics Hardware 2003
Fourier Transform of Real Functions • Pick two functions, let them be f(x) and g(x). • Let h(x) = f(x) + j g(x). • Note that there is no loss of information. • Can perform FFT of h in half the time as performing the brute force FFT of f and g individually. • Simply point to one row of image as real components and another as imaginary components. f g Graphics Hardware 2003
Untangling Fourier Transform Pairs • Fourier transform is linear. • H(u) = F(u) + j G(u) • We can “untangle” using symmetry of F and G. • Add and subtract H(u) and H(N – u) to cancel out conjugate terms of F and G. Graphics Hardware 2003
Untangling Fourier Transform Pairs Graphics Hardware 2003
Real Values Imaginary Values Packing Transforms of Real Functions • We can store Fourier transform in an array the same size as the input. • Throw away conjugate duplicates. • Throw away imaginary values known to be zero. Graphics Hardware 2003
Column-wise FFT • We have two columns with real values. • Use same “tangled” approach. • All other columns are complex numbers. • Use regular FFT. Real Real Paired for Complex Graphics Hardware 2003
Packing 2D Transforms of Real Functions • Rows transformed from complex values are already packed appropriately. • The two rows transformed from real values are untangled and packed to follow suite. Real Values Imaginary Values Graphics Hardware 2003
Available Resources • nVidia GeForce FX 5800 Ultra. • Full 32-bit floating point pipeline and frame buffers. • Fully programmable vertex and fragment units. • Cg • High level language for vertex and fragment programs. • Traditional CPU: 1.7 GHz Intel Zeon • Freely available high performance FFT implementations. Graphics Hardware 2003
Implementation • Using a SIMD model for parallel computation. • Draw quadrilateral parallel to screen. • Rasterizer invokes the same fragment program “in parallel” over all pixels covered by quadrilateral. • Inputs/output dependent on location of pixel the fragment program is running. • We require many rendering passes. • Use “render to texture” extension. • Use two frame buffers: one for retrieving values of last pass and one for storing results of current computation. Graphics Hardware 2003
Imaginary Tangled Imaginary Tangled Real Tangled Real Tangled Scale Real G Scale Imag. G Pass Real G Pass Imag. G Real F Imag. F Real F Imag. F Real, Tangled Real Untangled Imag., Tangled Imaginary Untangled Real, Tangled Real Untangled Imag., Tangled Imaginary Untangled I, F Scale I, G Scale I, F Pass I, G Pass R, F R, G R, F R, G Implementation FFT Untangle FFT Untangle Frequency Spectra Images FFT Untangle FFT Untangle Graphics Hardware 2003
Fragment Programs • Written in Cg, compiled for GeForce FX. Graphics Hardware 2003
Applications • Digital image filtering. Graphics Hardware 2003
Applications • Texture generation. • Volume rendering. Graphics Hardware 2003
Performance • Computation speed: 2.5 GigaFLOPS • Texture read rate: 3.4 GB/sec Graphics Hardware 2003
Conclusions • The Fourier transform on the GPU has many potential applications. • A well established FFT on the CPU (FFTW) still has an edge over GPU implementation. • Both software and hardware of GPU are first generations. • Room for improvement. Graphics Hardware 2003
Get the Cg Code • http://www.cgshaders.org ? • http://www.cs.unm.edu/~kmorel/documents/fftgpu • kmorel@sandia.gov Graphics Hardware 2003
Questions? Graphics Hardware 2003