90 likes | 103 Views
Learn about the process of porting VisIt to the Blue Gene/P platform, improvements made, impact on performance, and future work planned for optimization.
E N D
Porting VisIt to BG/P Brad Whitlock October 14, 2009 www.vacet.org
Overview • Objectives • Building 3rd party libraries • Building VisIt • Running VisIt on BG/P • Improvements • Impact • Future work www.vacet.org
Objectives • Port VisIt to IBM’s BlueGene/P platform so VisIt can run on LLNL’s Dawn and eventually Sequoia • Dawn is a 500 Teraflop, 36,864 node, 147,456 cpu, IBM BG/P system • 4 850MHz PowerPC cores/node, 4Gb Memory/node • Compute nodes run CNK OS • Cross-compile code for CNK • Identify weaknesses in VisIt that prevent it from scaling to tens/hundreds of thousands of processors www.vacet.org
Building 3rd party libraries • Built all libraries on login nodes for regular Linux PowerPC version of VisIt • Ran into runtime problems using xlC compiler so reverted to g++ for the time being • Cross-compiled all libraries for CNK • No support for this platform in VisIt’s 3rd party libraries so special builds were required • Mesa built unmangled and no X11 • VTK tricky to build • No OpenGL so VTK built with Mesa as its OpenGL • No X11 so created custom render window • Used CMake toolchain file www.vacet.org
Building VisIt • No X11 so graphical components can’t be built for CNK (don’t build gui) • Added new --enable-engine-only build mode to VisIt’s build system that only builds the compute engine and its plugins • VisIt always used to require mangled mesa • This support had to become conditional on VTK having mangled mesa support www.vacet.org
Running VisIt on Dawn • Dawn uses mpirun to start VisIt on compute nodes • Minor differences required environment variables to be exported via mpirun command, which could be handled via host profile in VisIt • VisIt ran at 1k,2k,4k,8k,16k nodes • VisIt ran with 1 and 4 trillion zone datasets (June09) • Encountered scaling problems early • Launch time slow because each processor was reading plugin directory to obtain plugin information • VisIt commands were sent from rank 0 to other ranks 1Kb at a time until a message was sent • Non-spinning bcast substitute used for sending commands had point-to-point that performed poorly at scale • Certain metadata consumed too much memory (each processor has ~700Mb only) • Synchronization step for SR mode used slow point-to-point www.vacet.org
Improvements • Broadcast plugin information from rank 0 to other ranks to improve plugin loading time 9x • Broadcast VisIt commands from rank 0 in a single chunk instead of 1Kb at a time • Use standard bcast in engine main loop instead of poorly performing non-spin substitute geared towards shared nodes • Switched to alternate metadata representation to free up most available memory for calculations • Mark Miller was able to replace SR mode synchronization step with much faster version that reduced time to 2 seconds from 20 minutes www.vacet.org
Impact • So far this project’s impact has been small for customers • They do not yet run on Dawn • They might not notice small improvements at today’s everyday processor counts (<2k) • At higher processor counts (>4k) optimizations added by this work prevent bottlenecks in compute engine, improving scalability www.vacet.org
Future work • Resolve load problems with xlC compiler so we can use the best optimizations, including using BG/P’s dual FPU’s • Improve 3rd party library build process for BG/P by adding support in build_visit script • Continue profiling plots and improving performance • Reduce memory usage where possible • Investigate I/O patterns and attempt optimizations www.vacet.org