170 likes | 293 Views
Mapping the FFT Algorithm to the IBM Cell Processor. Andy Polidore Advisors: Brendan Burns, Joseph Czechowski. Motivation. MRI Imaging Fast Fourier Transformations Efficient algorithm for computing a Discrete Fourier Transform DFT converts time-domain to frequency-domain
E N D
Mapping the FFT Algorithm to the IBM Cell Processor Andy Polidore Advisors: Brendan Burns, Joseph Czechowski
Motivation • MRI Imaging • Fast Fourier Transformations • Efficient algorithm for computing a Discrete Fourier Transform • DFT converts time-domain to frequency-domain • 2D FFT: Perform a 1D FFT on each row of an image and then perform a 1D FFT on each resulting column • The Cell • Nine cores • 1 Power Processing Unit (PPU) • 8 Synergistic Processing Units (SPU)
Strategy • Cell comes with 2d routine • Needs to be called twice • First call organizes the data in contiguous column form • Striping • Limited SPU memory • Quad Buffering
PPU SPU 0 Input Buffer Input DMA In FFT Output Buffer FFT out DMA Out PPU SPU 0 Input Buffer Input DMA In FFT Output Buffer FFT out DMA Out
SPU 0 PPU SPU 1 SPU 7 Input Buffer DMA In Input FFT Output Buffer FFT out
PPU SPU 0 SPU 1 Input Buffer SPU 2 Input DMA In FFT Output Buffer FFT out DMA Out PPU Sync Point SPU 0 SPU 1 Input Buffer SPU 2 Input DMA In FFT Output Buffer FFT out DMA Out
Quad buffering • Why it is required? • Space problems • Maximizing processing power • Buffers • IN to handle incoming data • FFTin and FFTout to process the data • OUT stores the data ready to be DMA’ed back to main memory
Buffering A B C D 0 FILL ------- ------- ------- 1 2 3 4 5 6
Buffering A B C D 0 FILL ------- ------- ------- 1 FFTOUT FILL ------- FFTIN 2 3 4 5 6
Buffering A B C D 0 FILL ------- ------- ------- 1 FFTOUT FILL ------- FFTIN FFTOUT 2 OUT FFTIN FILL 3 4 5 6
Buffering A B C D 0 FILL ------- ------- ------- 1 FFTOUT FILL ------- FFTIN FFTOUT 2 OUT FFTIN FILL 3 OUT FILL FFTOUT FFTIN 4 5 6
Buffering A B C D 0 FILL ------- ------- ------- 1 FFTIN FFTOUT FILL ------- 2 FFTOUT OUT FFTIN FILL 3 OUT FILL FFTOUT FFTIN 4 FILL FFTIN OUT FFTOUT 5 FFTIN FFTOUT FILL OUT 6 FFTOUT OUT FFTIN FILL
Striping Main Memory SPU 0 SPU 1 SPU 2 SPU 3 SPU 4 SPU 5 SPU 6 SPU 7
Challenges • Simulator • Testing is slow • Alignment • Compiler • C coding • Working with bytes • Parallel processing • Data movement • Debugging
Knowledge Gained • Mastering Linux • C make files, linking, etc • Data movement strategies • Multi-core processing • Debugging!
Results and Conclusions • Success? • Future Work • Arbitrary size input