160 likes | 256 Views
Programming on IBM Cell Triblade. Jagan Jayaraj ,Pei-Hung Lin, Mike Knox and Paul Woodward University of Minnesota April 1, 2009. Rayleigh–Taylor instability.
E N D
Programming on IBM Cell Triblade Jagan Jayaraj ,Pei-Hung Lin, Mike Knox and Paul Woodward University of Minnesota April 1, 2009
Rayleigh–Taylor instability • An instability of an interface between two fluids of different densities, which occurs when the lighter fluid is pushing the heavier fluid. • Using multi-fluids Piecewise-Parabolic Method(PPM) to implement R-T instability simulation • Program is written in Fortran
TriBlade • Two QS22 blades, each with 2 PowerXCell 8i CPUs • LS21 blade with two dual-core AMD Opterons • 16GB memory for LS21 and 8GB memory for QS22
LCSE Cell Cluster • 6 Triblades • 4 QS22 Cell blades • 2 QS20 Cell blades • 4 AMD Quadcore Systems
Login instructions • Account credentials should be in your email. • Guest account: lcse / lcse$ncsa! • Login steps: • SSH to frodo.lcse.umn.edu • Once logged in to frodo SSH to an assigned Cell Processor host • AMD – rra001a ~ rra006a • Cell – rra001b / rra001c ~ rra006b/rra006c
Software available • Cell SDK 3.1 • OpenMPI 1.3 • DaCS Fortran bindings • Compilers • AMD: gfortran, gcc 4.1.2 • PPU: ppuxlf, ppu-gcc • SPU: spuxlf, spu-gcc • Example code is available on /mnt/scratch/NCSA_Example
Compilation and Execution • On AMD node: • make ppm4f-x86 • On Cell node: • make ppm4f-ppu • On AMD node: • ./ppm4f-x86
Triblade programming paradigm • Three levels of parallelism: • within-Cell • within-node • node-to-node • Compute-communication overlap • DMA • DaCS • MPI
Programming for IBM Cell Tri-blade • Single code for Roadrunner and non-RR systems • Using lots #ifdef, #if, #endif… • Using preprocessor to generate three codes • Minimize the manual translation for SPU code • Using Fortran to Cell C translator, • Tedious portions of the SPU code can be translated. • Fortran codes for PPU and AMD • Fortran binding programs for C intrinsic libraries • Keep memory footprint small
Single Source Code Preprocessor SPU Fortran code PPU Fortran code AMD Fortran code Translation SPU C code Fortran Binding Programs SPU C Compiler PPU Fortran Compiler GNU Fortran Compiler Embedded SPU Executable PPU Executable AMD Executable
Division of labor • Define jobs for AMD, PPU and SPU clearly • AMD: I/O, MPI, relay data to Cell… • PPU: Transfer data, manage SPUs • SPU: Just compute
Items to care • Three codes for three different ISAs • Different endian-ness between PPU and AMD • Need to do byte-swapping • 64bit/32bit conversion • SPU supports 32bit address only, but DaCS requires 64bit address mode
Translator • Fortran to C with Cell extensions • Needs directives • Built with ANTLR • Handles: • Vector and scalar loops • DMAs (Including List DMAs) • Variable declarations • Conditional vector moves
References • Woodward, P. R., J. Jayaraj, P.-H. Lin, and P.-C. Yew, “Moving Scientific Codes to Multicore Microprocessor CPUs,” Computing in Science & Engineering, special issue on novel architectures, Nov., 2008, p. 16-25. Also available at www.lcse.umn.edu/CiSE. • Woodward, P. R., J. Jayaraj, P.-H. Lin, and D. Porter, “Programming Techniques for Moving Scientific Simulation Codes to Roadrunner,” tutorial given 3/12/08 at Los Alamos, link available at www.lanl.gov/roadrunner/rrtechnicalseminars2008. • Woodward, P. R., J. Jayaraj, P.-H. Lin, and W. Dai, “First Experience of Compressible Gas Dynamics Simulationon the Los Alamos Roadrunner Machine,” submitted to Concurrency and Computation Practice and Experience, preprint available at www.lcse.umn.edu/RR-docs. • http://www.lcse.umn.edu/NCSA_Workshop/