1 / 21

FFT Accelerator Project

FFT Accelerator Project. Date : February 23,2007. Rohit Prakash(2003CS10186) Anand Silodia(2003CS50210). Current Objectives. Validate the number of complex multiplications Run the code with intel compiler and compare the results – For single run For multiple runs Tabulate all the results

amena-head
Download Presentation

FFT Accelerator Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FFT Accelerator Project Date : February 23,2007 Rohit Prakash(2003CS10186) Anand Silodia(2003CS50210)

  2. Current Objectives • Validate the number of complex multiplications • Run the code with intel compiler and compare the results – • For single run • For multiple runs • Tabulate all the results • Analyse these using vTune

  3. Number of Complex multiplications • Our results • (11/4)*nlog4(n) =8960 • Result on net • (3/4)*nlog4(n) = 3840 • The inner loop is trivial and does not require any “complex multiplications”

  4. Inner loop of our Algorithm TA[k+j] Uw*A[k+j+m/4] Vw*w*A[k+j+m/2] Xw*w*w*A[k+j+3*m/4] A[k+j]T+U+V+X A[k+j+m/4]T+(i)U-V-(i)X A[k+j+2m/4]T-U+V-X A[k+j+3m/4]T-(i)U-V+(i)X Ww*w_m Total number of multiplications n this loop : 11

  5. New Inner loop of our Algorithm • TA[k+j] • Utwiddle[k]*A[k+j+m/4] • Vtwiddle[2*k]*A[k+j+m/2] • Xtwiddle[3*k]*A[k+j+3*m/4] • A[k+j]T+U+V+X • A[k+j+m/4]T+i*U-V-i*X • A[k+j+2m/4]T-U+V-X • A[k+j+3m/4]T-i*U-V+i*X Total number of multiplications n this loop : 3 (3/4)*nlog4(n) =3840

  6. Stuff we tried • Improved the “bit reversal” • Better than the last time • Though inefficient (O(nlogn)), still works faster than the previous implementation • Still there exists many fast algorithms

  7. System Specifications • Processor: Intel Pentium 4 CPU 3.00Ghz • Cache Size: 1MB • RAM: 1GB • Flags supported : sse, sse2

  8. Results User time(ms) for 1024 points (single iteration)

  9. Results User time(ms) for 1024 points (10 iterations)

  10. Results User time for 4096 points (single iteration)

  11. Results User time(ms) for 4096 points (10 iterations)

  12. Results User time(ms) for 262144 points (single iteration)

  13. Results User time(ms) for 262144 points (10 iterations)

  14. Analysis • Results are comparable due to the following reasons • Change in bit reversal • Number of computations • FFTW : compiling option gcc • Got to re-write the code for arbitrary number of points

  15. Tabular Representation(1024 points)

  16. Tabular Representation(4096 point)

  17. Tabular Representation(262144 point)

  18. Vtune Analysis • TODO • Vtune (not available)

  19. Further Improvements • Fast digit reversal • Fast “twiddle compute” • TODO: • Comparison with Intel Math Kernel library • Study FFTW implementation • Vtune Analysis • Try winograd algorithm • Code more efficiently

  20. References • Alan H. Karp “Bit Reversal on Uniprocessors” • Angelo A. Yong “A better FFT Bit-reversal Algorithm”

  21. Thank You

More Related