FFT: Accelerator Project

FFT: Accelerator Project Rohit Prakash Anand Silodia

Work done till now • Studied various FFT algorithms • Implemented radix-4, recursive and iterative algorithms • Optimized these • Compared the results with FFTW RESULT- • FFTW fares better than our implementation

Current Objectives • Validate the number of complex calculations in our implementation with theoretical number of computations • Document the work done till now • Make a website of the project • Study FFTW code (also figure out the reasons for its efficiency) • Run the code on intel compiler (icc)/ visual c++

Validating the computations • Incorrect theoretical formula (cnx.org) • Theoretical formula (for no. of complex computations) = (11/4)*nlog4(n) =8960 (Correct) (3/4)*nlog4(n) = 3840 (Incorrect) Actual 8960

Documentation and website • Website of the project – • www.cse.iitd.ac.in/~cs1030186/btp • Includes the details and results of our experimentations (till last week)

Running on intel compiler icc • No improvement • Possible reasons – • Tested on Intel Pentium Mobile • This does not support optimizations like exploiting SSE3 instructions (-fast flag)

FFTW code • 56,489+ LOC (contains code written in Ocaml and C) • We decided to study why FFTW is so fast (before going into the code itself) • Text we came across in this context – • Design and implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) • Documentation of FFTW

Why is FFTW fast? • The transform is computed by an executor, composed of highly optimized, composable blocks of C code called codelets • At runtime, a ‘planner’ finds an efficient way to compose codelets: it measures the speed of different plans and chooses the best using a dynamic programming algorithm • The executor interprets the plan with negligible overhead • Codelets are generated automatically and are fast

Contd… • The executor implements the recursive divide and conquer Cooley Tukey FFT algorithm • Basically, it adapts to hardware in order to maximize performance • ‘Performance has little to do with the number of operations.Fast code must exploit instruction level parallelism of the processor. It is important to write the code in such a way that C compiler can schedule it efficiently’

Contd… • It uses some tricky optimizations like – • It also exploits SIMD instructions

Further plan ? • Since FFTW supports MPI and adapts itself to the given hardware architecture, we may use it as it is.

References • www.fftw.org • The Design and Implementation of FFTW3 (Matteo Frigo and Steven G. Johnson) • The Fastest Fourier Transform in the West (Matteo Frigo and Steven G. Johnson)

Thank You

FFT: Accelerator Project

FFT: Accelerator Project

Presentation Transcript

Accelerator Physics Topic I Acceleration

Project Management Basics

Project Manager as Generalist: Project Manager as Obsolete

Introduction to Project Management

Project Proposal Project 7: Drifters

INTRODUCTION TO PROJECT FINANCE

SAP Project Management- SAP Implementation and Strategies

Framework for Project Management

Week 15: CS202 Exam Review

Project Management

The project structure (WBS)

Linear Accelerator (LINAC)

ERP Project Kick Off

Linear Accelerator (LINAC)

Project Finance

PROJECT CYCLE MANAGEMENT TRAINING

Software Project Management

Software Project Management

Accelerator Magnets

Software Project Management 2007 Project Scope Management

Introduction

Project finance