The AMP Backplane

The AMP Backplane Discreet Management of Numerical Libraries and Multiphysics Data Bill Cochran Oak Ridge National Laboratory cochranwk@ornl.gov

The AMP Backplane Developers: Collaborators: Advisors: Oak Ridge National Lab Argonne National Lab Oak Ridge National Lab Los Alamos National Lab Oak Ridge National Lab Argonne National Lab Idaho National Lab Los AlamosNational Lab AbdellatifYacout Marius Stan Larry Ott John Turner Mike Rogers Kevin Clarno Bobby Philip Bill Cochran SrdjanSimunovic RahulSampath SrikanthAllu GokanYesilyurt Jung Ho Lee James Banfield PallabBarai SreekanthPannala PhaniNukala Larry Ott Jay Billings Richard Martineau Glen Hansen SametKadioglu Ray Berry Cetin Unal Steven Lee Los Alamos National Lab Gary Dilts BogdanMihaila Discreet Management of Numerical Libraries and Multiphysics Data

The AMP Backplane Vectors Epetra_Vectorx; Vecy; N_Vectorz; VecAXPBYPCZ ( z , alpha , 1 , 0 , x , y );

Why So Many Libraries? AMP uses: Contact Preconditioning Moertel and ML SNES and KSP JFNK IDA Time integration

The AMP Backplane Mechanics Vectors Temperature Matrices Oxygen Diffusion Burn Up Neutronics Etc. Epetra_CrsMatrix P; Mat A; N_Vectorx, y, z; P.Multiply ( false , x , y ); MatMult ( A , y , z ); stk::mesh::EntitycurElement; libMesh::FE integrator; integrator.reinit ( &curElement ); Epetra_Vectorx; Vecy; N_Vectorz; VecAXPBYPCZ ( z , alpha , 1 , 0 , x , y ); Meshes VecmultiPhysicsSolution; VectempPellet, displacementPellet; VecthermoMechanicsPellet; SolveThermoMechanics ( thermoMechanicsPellet );

How Does It Work? Virtual methods TheLessYouKnow Polymorphism Templates Iterators Standard template library

How Do I Use It? Master six classes Linear combinations, Norms, Get/Set etc. Matrix-Vector products Scaling etc. Multiple domains Parallel management I/O Space allocation etc. Entity iteration Boundary conditions Memory management Vector indexing etc. Mapping mesh entities to indices in vectors and matrices Describe desired memory layout Index individual physics AMP::Vector AMP::Matrix AMP::MeshManager AMP::MeshManager::Adapter AMP::Variable AMP::DOFMap

? In Parallel How Do I Use It Step 1: makeConsistent() Step 2: ??? Step 3: Profit! Multi-multicore Multicore

How Discreet Is It? AMP::Vector::shared_ptrsundialsView; sundialsView = AMP::SundialsVector::view ( vector ); N_VectorsundialsVec; sundialsVec = sundialsView->castTo<AMP::SundialsVector>().getNVector(); AMP::Vector::shared_ptrepetraView; epetraView = AMP::EpetraVector::view ( vector ); Epetra_Vector &epetraVec = epetraView->castTo<AMP::EpetraVector>().getEpetra_Vector(); AMP::Vector::shared_ptrthermalResidual; AMP::Vector::shared_ptrthermalSolution; thermalResidual = residual->subsetVectorForVariable ( temperatureVar ); thermalSolution = solution->subsetVectorForVariable ( temperatureVar ); AMP::Vector::shared_ptrpetscView; petscView = AMP::PetscVector::view ( vector ); VecpetscVec; petscVec = petscView->castTo<AMP::PetscVector>().getVec(); • Single domain/single physics • Default linear algebra engine • Hopefully, limitation eased by Tpetra • Variables describe • Memory layout • Physics • Discretization • Most vector functionality • Enough matrix functionality • Works with SNES and KSP • Most vector functionality • Works with IDA

What About Performance? C++ Clever compiler optimizations 2) Virtual methods Iterative access: FORTRAN-esque speed L2Norm(), dot(), min(), axpy(), scale(), … Non-iterative access: for ( i = 0 ; i != numElems ; i++ ) for ( j = 0 ; j != 8 ; j++ ) vector->addValue ( elem[8*i+j] , phi );

Digression Time to perform dot product 2 vectors: 0.05 secs Virtual method penalty: 50% Time to perform tight loop virtual method dot product: 0.075 secs Dot product # floating point ops: 2n-1 40n-20 Dot product FLOPS (FORTRAN style): Similar sized matvecw.r.t. FLOPS: 24n-12 matvec cache penalty: 40%

What About Performance? C++ Clever compiler optimizations 2) Virtual methods Iterative access: FORTRAN-esque speed Non-iterative access: for ( i = 0 ; i != numElems ; i++ ) vector->addValues ( 8 , elem + 8*i , phi ); for ( i = 0 ; i != numElems ; i++ ) for ( j = 0 ; j != 8 ; j++ ) vector->addValue ( elem[8*i+j] , phi[j] );

Does it work? 100,000+ unit tests: AMP interface AMP interface vsPETSc AMP interface vsEpetra PETSc wrappers SUNDIALS wrappers EpetravsPETSc Various bugs found in development Single physics, single domain Multiple physics, single domain Single physics, multiple domains Multiple physics, multiple domains Multiple linear algebra engines AMP vectors PETSc views Sundials views Serial Parallel Views Clones Clones of views Views of clones

What Can It Do? SUNDIALS IDA time integration PETSc SNES JFNK quasi-static Trilinos ML preconditioning

What Can It Do? Reading Meshes 21.5k Elements/core 10.75k Elements/core 43k Elements/core Time (s) 128 domains (88M Elements) Superscaling 32 domains (22M Elements) Number of cores

What Can It Do? Multiphysics Multidomain Multicore

What’s On The Horizon? “PMPIO” check pointing and restart Hot swap linear algebra engines Rudimentary contact search On-the-fly d.o.f. extraction Better interface for multi* data

What’s Left To Do? Performance testing and tuning More libraries Generalized discretizations Bringing everything together

The AMP Backplane