260 likes | 387 Views
NERSC User Group Meeting The DOE ACTS Collection Osni Marques Lawrence Berkeley National Laboratory OAMarques@lbl.gov. What is the ACTS Collection?. http://acts.nersc.gov. A dvanced C ompu T ational S oftware Collection Tools for developing parallel applications
E N D
NERSC User Group MeetingThe DOE ACTS Collection Osni MarquesLawrence Berkeley National LaboratoryOAMarques@lbl.gov
What is the ACTS Collection? http://acts.nersc.gov • Advanced CompuTational Software Collection • Tools for developing parallel applications • ACTS started as an “umbrella” project Goals • Extended support for experimental software • Make ACTS tools available on DOE computers • Provide technical support (acts-support@nersc.gov) • Maintain ACTS information center (http://acts.nersc.gov) • Coordinate efforts with other supercomputing centers • Enable large scale scientific applications • Educate and train NUG Meeting
ACTS Timeline User Community Physics Engineering Mathematics Computer Sciences Biology Numerical Simulations Chemistry Challenge Codes Computing Systems Bioinformatics Medicine Scientific Computing Centers Pool of Software Tools ACTS Workshops and Training Computer Vendors Testing and Acceptance Phase Interoperability Software Sustainability Center Software Collection Software Tool Box ACTS Toolkit 1999 2003 2007 NUG Meeting
Research in computational sciences is fundamentally interdisciplinary The development of complex simulation codes on high-end computers is not a trivial task Productivity Time to the first solution (prototype) Time to solution (production) Other requirements Complexity Increasingly sophisticated models Model coupling Interdisciplinarity Performance Increasingly complex algorithms Increasingly complex architectures Increasingly demanding applications Libraries written in different languages Discussions about standardizing interfaces are often sidetracked into implementation issues Difficulties managing multiple libraries developed by third-parties Need to use more than one language in one application The code is long-lived and different pieces evolve at different rates Swapping competing implementations of the same idea and testing without modifying the code Need to compose an application with some other(s) that were not originally designed to be combined Challenges in the Development of Scientific Codes NUG Meeting
Current ACTS Tools and their Functionalities Availability To be installed To be installed Installed Upon request Upon request Installed* To be installed Installed* To be installed Installed** Upon request Under test Upon request * Also in LibSci ** USG NUG Meeting
ACTS Tools: numerical functionalities NUG Meeting
ACTS Tools: numerical functionalities NUG Meeting
ACTS Tools: numerical functionalities NUG Meeting
Software Interfaces Linear System Interfaces Linear Solvers GMG FAC Hybrid, ... AMGe ILU, ... Data Layout structured composite blockstrc unstruc CSR CALL BLACS_GET( -1, 0, ICTXT ) CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL ) : CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) : CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO ) function call (ScaLAPACK) • -ksp_type [cg,gmres,bcgs,tfqmr,…] • -pc_type [lu,ilu,jacobi,sor,asm,…] More advanced: • -ksp_max_it <max_iters> • -ksp_gmres_restart <restart> • -pc_asm_overlap <overlap> • -pc_asm_type [basic,restrict,interpolate,none] command line (PETSc) problem domain (Hypre) NUG Meeting
Use of ACTS Tools 3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of Kaushik and Keyes). Model of the heart mechanics (blood-muscle-valve) by an adaptive and parallel version of the immersed boundary method, using PETSc, Hypre and SAMRAI (courtesy of Boyce Griffith, New York University). Micro-FE bone modeling using ParMetis, Prometheus and PETSc; models up to 537 million DOF (Adams, Bayraktar, Keaveny, and Papadopoulos). Molecular dynamics and thermal flow simulation using codes based on Global Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid, etc. Electronic structure optimization performed with TAO, (UO2)3(CO3)6 (courtesy of deJong). Problems (different grid types) solved with Hypre. NUG Meeting
Use of ACTS Tools Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP. Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon, Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK. Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution of generalized eigenvalue problems.A parallel exact shift-invert eigensolver based on PARPACK and SuperLUhas allowed for the solution of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP. OPT++is used in protein energy minimization problems (shown here is protein T162 from CASP5, courtesy of Meza , Oliva et al.) NUG Meeting
ScaLAPACK UTK, UCB … http://acts.nersc.gov/scalapack Version 1.7.5 released in January 2007; NSF funding for further development. ScaLAPACK PBLAS Global Parallel BLAS. Local LAPACK BLACS Linear systems, least squares, singular value decomposition, eigenvalues. Communication routines targeting linear algebra operations. platform specific BLAS MPI/PVM/... Clarity,modularity, performance and portability. Atlas can be used here for automatic tuning. Communication layer (message passing). NUG Meeting
ScaLAPACK: understanding performance Execution time of PDGESV for various grid shape 90-100 100 80-90 90 70-80 80 60-70 70 50-60 60 40-50 seconds 50 30-40 1x60 40 20-30 2x30 30 10-20 3x20 grid shape 20 0-10 4x15 10 5x12 0 6x10 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 problem size LU on 2.2 GHz AMD Opteron (4.4 GFlop/s peak performance) 60 processors, Dual AMD Opteron 1.4GHz Cluster with Myrinet Interconnect, 2GB memory NUG Meeting
ScaLAPACK: understanding the 2D block-cyclic distribution http://acts.nersc.gov/scalapack/hands-on/datadist.html NUG Meeting
PETSc ANL PETSc PDE Application Codes ODE Integrators Visualization Nonlinear Solvers, Unconstrained Minimization Interface Linear Solvers Preconditioners + Krylov Methods Object-Oriented Matrices, Vectors, Indices Grid Management Profiling Interface Computation and Communication Kernels MPI, MPI-IO, BLAS, LAPACK Portable, Extensible Toolkit for Scientific computation NUG Meeting
PETSc: Linear Solvers (SLES) Main Routine PETSc Linear Solvers (SLES) Solve Ax = b PC KSP Application Initialization Evaluation of A and b Post- Processing User code PETSc code NUG Meeting
PETSc: setting SLES parameters at run time • -ksp_type [cg,gmres,bcgs,tfqmr,…] • -pc_type [lu,ilu,jacobi,sor,asm,…] more advanced: • -ksp_max_it <max_iters> • -ksp_gmres_restart <restart> • -pc_asm_overlap <overlap> • -pc_asm_type [basic,restrict,interpolate,none] • many more (see manual) NUG Meeting
Important Questions for Application Developers • How does performance vary with different compilers? • Is poor performance correlated with certain OS features? • Has a recent change caused unanticipated performance? • How does performance vary with MPI variants? • Why is one application version faster than another? • What is the reason for the observed scaling behavior? • Did two runs exhibit similar performance? • How are performance data related to application events? • Which machines will run my code the fastest and why? • Which benchmarks predict my code performance best? From http://acts.nersc.gov/events/Workshop2005/slides/Shende.pdf NUG Meeting
TAU U Oregon Tuning and Analysis Utilities • Multi-level performance instrumentation • Multi-language automatic source instrumentation • Flexible and configurable performance measurement • Widely-ported parallel performance profiling system • Computer system architectures and operating systems • Different programming languages and compilers • Support for multiple parallel programming paradigms • Multi-threading, message passing, mixed-mode, hybrid • Support for performance mapping • Support for object-oriented and generic programming • Integration in complex software systems and applications NUG Meeting
Definitions: profiling and tracing • Profiling • Recording of summary information during execution (inclusive and exclusive time, number of calls, hardware statistics, etc) • Reflects performance behavior of program entities (functions, loops, basic blocks, user-defined “semantic” entities) • Very good for low-cost performance assessment • Helps to expose performance bottlenecks and hotspots • Implemented through • sampling: periodic OS interrupts or hardware counter traps • instrumentation: direct insertion of measurement code • Tracing • Recording of information about significant points (events) during program execution • entering/exiting code region (function, loop, block, etc) • thread/process interactions (send/receive message, etc) • Save information in event record • timestamp • CPU identifier, thread identifier • Event type and event-specific information • Event trace is a time-sequenced stream of event records • Can be used to reconstruct dynamic program behavior • Typically requires code instrumentation NUG Meeting
TAU: Example 1 (1/2) http://acts.nersc.gov/tau/programs/pdgssvx Ex. tau-multiplecounters-mpi-papi-pdt set the C compiler NUG Meeting
TAU: Example 1 (2/2) PARAPROF PAPI provides access to hardware performance counters (see http://icl.cs.utk.edu/papi for details and contact acts-support@nersc.gov for the corresponding TAU events). In this example we are just measuring FLOPS. NUG Meeting
TAU: Example 2 (1/2) PESCAN is a code that uses the folded spectrum method for nonselfconsistent nanoscale calculations. It uses a planewave basis, and conventional Kleinman-Bylander nonlocal pseudopotetials in real space. It is parallelized using MPI and can calculate million atom systems. • # Makefile for PESCAN • include $(TAULIBDIR)/Makefile.tau-multiplecounters-mpi-papi-pdt • #include $(TAULIBDIR)/Makefile.tau-callpath-mpi-pdt • FC = $(TAU_COMPILER) mpxlf90_r • CC = $(TAU_COMPILER) mpcc_r • ⋮ NUG Meeting
TAU: Example 2 (2/2) NUG Meeting
The Case for Software Libraries APPLICATION algorithmic implementations CONTROL application data layout I/O machine tuned and dependent modules New architecture: may or may not need re-rewriting New developments: difficult to predict New architecture: minimal to extensive rewriting New architecture: extensive re-rewriting New or extended Physics: extensive re-rewriting or increased overhead • New architecture or software: • Extensive tuning • May require new programming paradigms • Difficult to maintain! NUG Meeting
ACTS: value-added services • Requirements for reusable high quality software tools • Integration, maintenance and support efforts • Interfaces using script languages • Software automation • More information: • acts-support@nersc.gov • http://acts.nersc.gov PyACTS NUG Meeting