240 likes | 386 Views
ASTA Progress Report. Robert Henschel. April 23 2009. Contents. The Problem(s) Benchmarking and Tracing First Steps Toward a Solution Future work. The Problem(s). Researchers at Indiana University use a 3D hydrodynamic code to study planet formation
E N D
ASTA Progress Report Robert Henschel April 23 2009
Contents • The Problem(s) • Benchmarking and Tracing • First Steps Toward a Solution • Future work
The Problem(s) • Researchers at Indiana University use a 3D hydrodynamic code to study planet formation • Code is a finite-difference scheme on a Eulerian grid • User has a legacy Fortran code that has been made OpenMP parallel • Fortran 66, 77 and 90 • Originally made parallel for small problem sizes (16K computational cells) and small core counts (8-16), but is now used for much larger problem sizes (16M computational cells) • Code generates large amounts of data (1-4TB per simulation) that needs to be analyzed interactively • User would like to do many simulations
The Problem(s) • So the main issues to address are: • Scaling the code to larger core counts (64) and acquiring time on shared memory machines • Transferring multiple TB of data from the compute site to IU for interactive analysis • Automation of the simulation workflow
Benchmarking and Tracing • In order to determine how to improve scalability we first benchmarked the code and used the tool VampirTrace to trace it using 64 cores on Pople and used Pfmon to profile the code • Analysis was performed on three Altix systems at PSC, NASA, and ZIH
Benchmarking and Tracing • Based on the benchmarking, bottleneck subroutines were identified, specifically a subroutine which calculates the gravitational potential for the boundary cells and the subroutine which calculates the gravitational potential with the boundary potential as an input • These subroutines inherently require that all cells communicate with each other
Benchmarking and Tracing • Based on these results we identified two subroutines to restructure and devised a way to reformulate them to reduce off node communication • Several iterations of improvement were preformed and the user reported a speedup of 1.8 in the boundary generation subroutine • A more extensive restructuring of both of the subroutines has been devised, but has yet to be implemented
Transferring Data • Due to the large problem size, each of the user’s simulations generates several terabytes of data which is then analyzed interactively using IDL a proprietary data analysis and plotting package • Transferring this amount of data via traditional methods (ftp, scp, etc.) to IU is extremely time consuming and tedious • By mounting the Data Capacitor at PSC on Pople the user can write their data directly to IU and then access from servers in their department
Transferring Data • The IU Data Capacitor is a Lustre file system that can be mounted over the WAN • Modifications and tuning involved • Although the data are directly written to the Data Capacitor, network issues can introduce significant I/O overhead
Transferring Data • In tracing the code at PSC we discovered that I/O to the Data Capacitor was much slower than I/O to the local Lustre scratch disk
Transferring Data • Working with the Data Capacitor team we were able to track down the network issue and eliminate the I/O overhead resulting in a speedup of 30% for the user • Files now appear locally as they are generated by the users simulation
Automating the Workflow • The user would like to be able to run several ~10-15 simulations at the same time and have the data transfer and some preliminary analysis occur automatically • Some of the analysis can be automated • Generation of images • Calculation of gravitational torques • We are currently working on automating the user’s workflow
Future Work • Fully restructure the calculation of the gravitational potential • Devise and implement a workflow framework