The ACTS Toolkit ( Lecture Notes: What can it do for you? )

The ACTS Toolkit(Lecture Notes: What can it do for you?) Tony Drummond and Osni Marques Lawrence Berkeley National Laboratory (LBNL) National Energy Research Scientific Computing Center (NERSC) acts-support@nersc.gov

What is the ACTS Toolkit? • Advanced Computational Testing and Simulation • Tools for development of parallel applications • 21 tools • developed (primarily) at DOE labs • originally conceived as autonomous tools • ACTS is an “umbrella” project • collect tools • leverage numerous independently funded projects NPACI Parallel Computing Institute

Recent Successful Cases Scattering in a quantum system of three charged particles (Rescigno, Baertschy, Isaacs and McCurdy, Dec. 24, 1999). Cosmic Microwave Background Analysis, BOOMERanG collaboration, MADCAP code (Apr. 27, 2000). NPACI Parallel Computing Institute

NERSC Activities • Make ACTS tools available on NERSC platforms • Provide technical support (acts-support@nersc.gov) • Perform independent evaluation of tools • Maintain online ACTS information center • Identify new users who can benefit from toolkit • Work with users to integrate tools into applications http://acts.nersc.gov NPACI Parallel Computing Institute

ACTS Support • Support at different levels • applications • code optimization • tool selection • tool utilization • tool installation • Leverage with developers • Minimize risk to users NPACI Parallel Computing Institute

Tools Categorization • Numerical • software that implements numerical algorithms • Structural (“frameworks”) • software that manages data, communication • Infra-structural • runtime, support tools, developer’s bag NPACI Parallel Computing Institute

Trilinos Numerical Tools • Aztec: iterative methods for solving sparse linear systems • Hypre: collection of advanced preconditioners • Opt++: solution of nonlinear optimization problems • PETSc: methods for the solution of PDE and ODE related problems • PVODE: solvers for large systems of ODE’s • ScaLAPACK: dense linear algebra computations • SuperLU: direct methods for sparse linear systems NPACI Parallel Computing Institute

Structural (Frameworks) • Global Arrays: portable, distributed array library, shared memory style of programming • Overture: library of grid functions which derives from P++ arrays • POET (Parallel Object-oriented Environment and Toolkit): allows for “mixing and matching” of components • POOMA (Parallel Object-Oriented Methods and Applications): C++ abstraction layer between algorithm and platform (similar to HPF) NPACI Parallel Computing Institute

Infra-structural • CUMULVS (Collaborative User Migration User Library for Visualization and Steering), PAWS (Parallel Application WorkSpace): computational steering, data post-processing, interactive visualization • Globus: infrastructure for high performance distributed computing (computational grids) • SILOON (Scripting Interface Languages for Object-Oriented Numerics): scripting features • TAU (Tuning and Analysis Utilities): advanced performance analysis and tuning NPACI Parallel Computing Institute

Infra-structural (cont.) • Tulip: C++ applications with threads, global pointers and other parallel operations • ATLAS (Automatically Tuned Linear Algebra Software), PHiPAC (Portable High Performance ANSI C): automatic generation of optimized numerical software (mainly BLAS) • Nexus: multithreading, communication and resource management facilities • PADRE (Parallel Asynchronous Data and Routing Engine) : abstracts the details of representing and managing distributed data • PETE (Portable Expression Template Engine): efficient C++ operator overloading through expression templates NPACI Parallel Computing Institute

Numerical Tools • Aztec: iterative methods for solving sparse linear systems • Hypre: collection of advanced preconditioners • Opt++: solution of nonlinear optimization problems • PETSc: methods for the solution of PDE related problems • PVODE: solvers for large systems of ODE’s • ScaLAPACK: dense linear algebra computations • SuperLU: direct methods for sparse linear systems NPACI Parallel Computing Institute

Aztec • Solves large sparse linear systems of equations of the form: Ax = b Such as those which arise from applications which model complex physics problems using differential equations (e.g. finite differences or finite element methods) NPACI Parallel Computing Institute

Aztec Implements Krylov iterative methods (CG, CGS, Bi-CG-Stab, GMRES, TFQMR) Suite of preconditioners (Jacobi, Gauss-Seidel, overlapping domain decomposition with sparse LU, ILU, BILU within domains) Highly efficient, scalable (1000 processors on the “ASCI Red” machine) NPACI Parallel Computing Institute

Aztec (applications) TOUGH2 (Transport Of Unsaturated Groundwater and Heat) code, transport simu-lation in porous and fractured media (LBNL). Co-flowing Annular Jet Combuster, a parallel 3D pseudo-transient simulation to steady state operation; MPSalsa code (SNL). NPACI Parallel Computing Institute

Aztec (basic steps) • Prepare your linear system • distribute the matrix • call AZ_transform • set up right-hand side and initial guess • call AZ_reorder_vec on initial guess and right-hand side • selective an iterative solver and a preconditioner • call AZ_solve • call AZ_invorder_vec on solution NPACI Parallel Computing Institute

PETSc • Portable, Extensible Toolkit for Scientific Computing • What can it do?: • Support the development of parallel PDE solvers • Implicit or semi-implicit solution methods, finite element, finite difference, or finite volume type discretizations. • Specification of the mathematics of the problem • Vectors (field variables) and matrices (operators) • How to solve the problem? • Linear, non-linear, and timestepping (ODE) solvers NPACI Parallel Computing Institute

PETSc • Parallelism • Uses MPI • Data Layout: structure and unstructured meshes • Partitioning and coloring • Viewers • Printing Data Object information • Visualization of a field and matrix data • Profiling and performance Tuning • -log_summary • Profiling by stages of an application • User define events NPACI Parallel Computing Institute

PETSc NPACI Parallel Computing Institute

Argc and Argv are used to passed Run time commands to PETSc and MPI PETSc (Simple example) /* From: http://www.mcs.anl.gov/petsc/src/sys/examples/tutorials/ex1.c */ /* Program usage: mpirun ex1 [-help] [all PETSc options] */ static char help[] = "This is an introductory PETSc example that illustrates printing.\n\n"; /* Concepts: Introduction to PETSc; Routines: PetscInitialize(); PetscPrintf(); PetscFinalize(); Processors: n */ #include "petsc.h" int main(int argc,char **argv) NPACI Parallel Computing Institute

Every PETSc program should begin with the PetscInitializeroutine. PETSc (Simple example cont. .) { int ierr,rank,size; ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRA(ierr); /* The following MPI calls return the number of processes being used and the rank of this process in the group. */ ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRA(ierr); ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRA(ierr); NPACI Parallel Computing Institute

Prints a single message Prints multiple message A program must always end with PetscFinalize PETSc (Simple example cont. .) ierr = MPI_Comm_size(PETSC_COMM_WORLD,&size);CHKERRA(ierr); ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRA(ierr); ierr = PetscPrintf(PETSC_COMM_WORLD,"Number of processors = %d, rank = %d\n",size,rank);CHKERRA(ierr); ierr = PetscPrintf(PETSC_COMM_SELF,"[%d] Jumbled Hello World\n",rank);CHKERRA(ierr); ierr = PetscFinalize();CHKERRA(ierr); return 0; } NPACI Parallel Computing Institute

PETSc (applications) Multiphase flow, 4 million cell blocks, 32 million DOF, over 10.6 Gflops on an IBM SP (128 nodes), entire simulation runs in less than 30 minutes (Pope, Gropp, Morgan, Seperhrnoori, Smith and Wheeler). Prometheus code (unstructured meshes in solid mechanics), 26 million DOF, 640 nodes on NERSC’s Cray T3E (Adams and Demmel). NPACI Parallel Computing Institute

PETSc’s SLES (Basic Steps) • Define the linear system (Ax=b) • MatCreate, MatSetValue, VecCreate • Create the Solver • SLESCreate, SLESSetOperators • Solve System of Equations • SLESSolve • Clean up • SLESDestroy NPACI Parallel Computing Institute

PETSc’s SNES (Basic Steps) • Non-linear equations of the form: • F(x) = 0 • Unconstrained Minimization problems of the form: • Min{f(x)} • Create the Solver • SNESCreate • Create Matrices and vectors (like Jacobian matrix) • MatCreate, MatSetValue, VecCreate • Set evaluation routine and linear solver defaults • Solve non-linear system : SNESSolve • Clean up NPACI Parallel Computing Institute

PETSc’s TS (Basic Steps) • Consider the ODE u= F(u,t), where u is finite-dimensional vector • Create a TS object • TSCreate • Select a solution Method (Euler, BEULER, PSEUDO) • Set initial time and timestep • TSSetTimeStep • Set the total number of timesteps: • TSSetDuration • Set the timestep context • Clean up t NPACI Parallel Computing Institute

ScalaPACK • A collection of routines for solving: • Linear systems of equations • Least squares problems • Eigenproblems • Singular problems • Dense linear algebra (BLAS) • Direct solution of linear systems • Dense matrix eigensolvers NPACI Parallel Computing Institute

ScalaPACK Software Hierarchy NPACI Parallel Computing Institute

ScalaPACK • BLAS: • Common linear algebra computations • Dot products, matrix-vector multiplication and matrix-matrix multiplication • Matrix-matrix operations can mask the effects of memory hierarchy (platform specific) • Portability • PBLAS • Interface is very similar to BLAS • Makes ScaLAPACK codes to be quite similar to LAPACK ones. NPACI Parallel Computing Institute

ScalaPACK • BLACS: • Message Passing designed for linear algebra • Data layout : 1 or 2 Dimensional grid of processes • Operations: • Synchronous send and receives • Broadcast • Global reductions • Process Grouping and multi-membership NPACI Parallel Computing Institute

ScalaPACK • ScaLAPACK: • High Efficiency on MIMD such Intel Paragon, Cray T3E, IBM SP series, clusters of workstations • Message passing PVM and MPI (heterogeneous environments) • Efficiency depends on the block partitioning algorithm and vendor supply implementations of BLACS and BLAS NPACI Parallel Computing Institute

ScalaPACK (simple example) Example Program solving Ax=b via ScaLAPACK routine PDGESV * Initialize the process grid CALL SL_INIT( ICTXT, NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) * Distribute the matrix on the process grid CALL DESCINIT( DESCA, M, N, MB, NB, RSRC, CSRC, ICTXT, MXLLDA, INFO ) CALL DESCINIT( DESCB, N, NRHS, NB, NBRHS, RSRC, CSRC, ICTXT, MXLLDB, INFO ) NPACI Parallel Computing Institute

ScalaPACK (simple example cont.) Generate matrices A and B and distribute CALL MATINIT( A, DESCA, B, DESCB ) Make a copy of A and B for checking purposes CALL PDLACPY( 'All', N, N, A, 1, 1, DESCA, A0, 1, 1, DESCA ) CALL PDLACPY( 'All', N, NRHS, B, 1, 1, DESCB, B0, 1, 1, DESCB ) Solve the linear system A * x = B CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, INFO ) NPACI Parallel Computing Institute

ScalaPACK (simple example cont.) Compute residual ||A * X - B||/( ||X|| * ||A|| * eps * N ) EPS = PDLAMCH( ICTXT, 'Epsilon' ) ANORM = PDLANGE( 'I', N, N, A, 1, 1, DESCA, WORK ) BNORM = PDLANGE( 'I', N, NRHS, B, 1, 1, DESCB, WORK ) CALL PDGEMM( 'N', 'N', N, NRHS, N, ONE, A0, 1, 1, DESCA, B, 1, DESCB, -ONE, B0, 1, 1, DESCB ) XNORM = PDLANGE( 'I', N, NRHS, B0, 1, 1, DESCB, WORK ) RESID = XNORM / ( ANORM*BNORM*EPS*DBLE( N ) ) Release the process grid, free the BLACS context and Exit BLACS CALL BLACS_GRIDEXIT( ICTXT ) CALL BLACS_EXIT( 0 ) NPACI Parallel Computing Institute

ScaLAPACK (applications) Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon, Pfrommer and Canning). Cosmic Microwave Background Analysis, BOOMERanG collaboration, MADCAP code (Apr. 27, 2000). NPACI Parallel Computing Institute

SuperLU Direct solution of large sparse linear systems Shared and distributed memory implementations Attained 8.3 Gflops on 512 nodes of the T3E NPACI Parallel Computing Institute

SuperLU (applications) Scattering in a quantum system of three charged particles (Rescigno, Baertschy, Isaacs and McCurdy, Dec. 24, 1999). SuperLU speedup (matrix dimensions varying from 26028 to 120750). NPACI Parallel Computing Institute

Structural (Frameworks) • Global Arrays: portable, distributed array library, shared memory style of programming • Overture: library of grid functions which derives from P++ arrays • POET (Parallel Object-oriented Environment and Toolkit): allows for “mixing and matching” of components • POOMA (Parallel Object-Oriented Methods and Applications): C++ abstraction layer between algorithm and platform (similar to HPF) NPACI Parallel Computing Institute

Global Arrays • Programming model is based on an explicit distinction between local and global • Communication • Accessing GA Distributed Arrays • Conventional Message Passing (MPI) • Support from data transfers between local and remote • Support for synchronization NPACI Parallel Computing Institute

Global Arrays NPACI Parallel Computing Institute

Global Array Supported Operations • Implementation dependent primitive operation • Implementation independent constructs • Collective primitive operations • create and destroy an array • create an array following a provided template • Synchronize all processes NPACI Parallel Computing Institute

Global Array Supported Operations • Non-collective primitive operations • fetch,store and accumulate into rectangular range of array • gather and scatter array element • direct access to local elements of an array • Linear Algebra Operations • vector operations (dot product, scale, etc. ) • matrix operations (multiply, eigenvalues, etc) NPACI Parallel Computing Institute

Global Array (Basic Steps) • ga_initialize() • must be the first call before any other ga call • ga_initialize_ltd(limit) • memory usage (0 = unlimited) • collective operation • ga_terminate() • delete all active arrays and clean up • collective operation NPACI Parallel Computing Institute

Global Array (Example Program) Program example1 Include ‘mpif.h’ Integer IERR, P, ME call MPI_Initialize(IERR) call ga_initialize() P = ga_nnodes() ME = ga_nodeid() write(*,*) ‘I am ‘, ME, ‘number of GA procs = ‘, P call ga_terminate() call MPI_Finalize() stop end NPACI Parallel Computing Institute

Infra-structural • CUMULVS (Collaborative User Migration User Library for Visualization and Steering), PAWS (Parallel Application WorkSpace): computational steering, data post-processing, interactive visualization • Globus: infrastructure for high performance distributed computing (computational grids) • SILOON (Scripting Interface Languages for Object-Oriented Numerics): scripting features • TAU (Tuning and Analysis Utilities): advanced performance analysis and tuning NPACI Parallel Computing Institute

CUMULVS NPACI Parallel Computing Institute

CUMULVS COMPONENTS • Generated at compilation time • Communicate by invoking the necessary protocols • Application-side library • Viewer-library • Fault recovery daemon • Check Point Daemon (CPD) per host • CPD’s can manage task migration • CPD’s monitor other CPD’s for failure and recovery • CPD’s coordinate the redundancy of checkpoint data NPACI Parallel Computing Institute

CUMULVS VIEWERS • Front end programs attached to a distributed applications. • Different views of the same data, different data (sub-regions) • Viewers can use any graphical system for rendering data fields views (AVS, Tcl/TK, virtual reality interface, or customized interface) NPACI Parallel Computing Institute

CUMULVS NPACI Parallel Computing Institute

CUMULVS (Basic Steps) • Setup input parameters to be steered • Specify the nature and decomposition of the data fields to be visualized. Standard data decompositions: Block, Block-Cyclic, particle decompositions and User defined decompositions • Use existing interfacing to visualization packages or define a custom viewer on top of other visualization tools. • Setup checkpoint/restart mechanism NPACI Parallel Computing Institute

The ACTS Toolkit ( Lecture Notes: What can it do for you? )