1 / 13

FFT: Accelerator Project

FFT: Accelerator Project. Rohit Prakash Anand Silodia. Work done till now. Studied various FFT algorithms Implemented radix-4, recursive and iterative algorithms Optimized these Compared the results with FFTW RESULT- FFTW fares better than our implementation. Current Objectives.

redell
Download Presentation

FFT: Accelerator Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FFT: Accelerator Project Rohit Prakash Anand Silodia

  2. Work done till now • Studied various FFT algorithms • Implemented radix-4, recursive and iterative algorithms • Optimized these • Compared the results with FFTW RESULT- • FFTW fares better than our implementation

  3. Current Objectives • Validate the number of complex calculations in our implementation with theoretical number of computations • Document the work done till now • Make a website of the project • Study FFTW code (also figure out the reasons for its efficiency) • Run the code on intel compiler (icc)/ visual c++

  4. Validating the computations • Incorrect theoretical formula (cnx.org) • Theoretical formula (for no. of complex computations) = (11/4)*nlog4(n) =8960 (Correct) (3/4)*nlog4(n) = 3840 (Incorrect) Actual 8960

  5. Documentation and website • Website of the project – • www.cse.iitd.ac.in/~cs1030186/btp • Includes the details and results of our experimentations (till last week)

  6. Running on intel compiler icc • No improvement • Possible reasons – • Tested on Intel Pentium Mobile • This does not support optimizations like exploiting SSE3 instructions (-fast flag)

  7. FFTW code • 56,489+ LOC (contains code written in Ocaml and C) • We decided to study why FFTW is so fast (before going into the code itself) • Text we came across in this context – • Design and implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) • Documentation of FFTW

  8. Why is FFTW fast? • The transform is computed by an executor, composed of highly optimized, composable blocks of C code called codelets • At runtime, a ‘planner’ finds an efficient way to compose codelets: it measures the speed of different plans and chooses the best using a dynamic programming algorithm • The executor interprets the plan with negligible overhead • Codelets are generated automatically and are fast

  9. Contd… • The executor implements the recursive divide and conquer Cooley Tukey FFT algorithm • Basically, it adapts to hardware in order to maximize performance • ‘Performance has little to do with the number of operations.Fast code must exploit instruction level parallelism of the processor. It is important to write the code in such a way that C compiler can schedule it efficiently’

  10. Contd… • It uses some tricky optimizations like – • It also exploits SIMD instructions

  11. Further plan ? • Since FFTW supports MPI and adapts itself to the given hardware architecture, we may use it as it is.

  12. References • www.fftw.org • The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) • The Fastest Fourier Transform in the West (Matteo Frigo and Steven G. Johnson)

  13. Thank You

More Related