460 likes | 567 Views
A Software Framework for Easy Parallelization of PDE Solvers. Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo. Outline of the Talk. Background Parallelization techniques based on domain decomposition at the linear algebra level Implementational aspects
E N D
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo
Outline of the Talk • Background • Parallelization techniques • based on domain decomposition • at the linear algebra level • Implementational aspects • Numerical experiments
The Question Starting point: sequential code How to do the parallelization? We need • a good parallelization strategy • a good and simple implementation of the strategy Resulting parallel solvers should have • good parallel efficiency • good overall numerical performance
Problem Domain • Partial differential equations • Finite elements/differences • Communication through message passing
Domain Decomposition • Solution of the original large problem through iteratively solving many smaller subproblems • Can be used as solution method or preconditioner • Flexibility -- localized treatment of irregular geometries, singularities etc • Very efficient numerical methods -- even on sequential computers • Suitable for coarse grained parallelization
Overlapping DD Alternating Schwarz method for two subdomains Example: solving an elliptic boundary value problem in A sequence of approximations where
Convergence of the Solution Single-phase groundwater flow
Coarse Grid Correction • This DD algorithm is a kind of block Jacobi iteration (CBJ) • Problem: often (very) slow convergence • Remedy: coarse grid correction • A kind of two-grid multigrid algorithm • Coarse grid solve on each processor
Observations • DD is a good parallelization strategy • The approach is not PDE-specific • A program for the original global problem can be reused (modulo B.C.) for each subdomain • Must communicate overlapping point values • No need for global data • Data distribution implied • Explicit temporal schemes are a special case where no iteration is needed (“exact DD”)
A Known Problem “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” - Smith, Bjørstad and Gropp One remedy: Use of object-oriented programming techniques
Goals for the Implementation • Reuse sequential solver as subdomain solver • Add DD management and communication as separate modules • Collect common operations in generic library modules • Flexibility and portability • Simplified parallelization process for the end-user
Subdomain Simulator seq. solver add-on communication The Subdomain Simulator
The Communicator • Need functionality for exchanging point values inside the overlapping regions • The communicator works with a hidden communication model • MPI in use, but easy to change
Realization • Object-oriented programming (C++, Java, Python) • Use inheritance, polymorphism, dynamic binding • Simplifies modularization • Supports reuse of sequential solver (without touching its source code!)
SubdomainSimulator SubdomainFEMSolver Simulator SimulatorP Administrator Making the Simulator Parallel class SimulatorP : public SubdomainFEMSolver public Simulator { // … just a small amount of code virtual void createLocalMatrix () { Simulator::makeSystem (); } };
Performance • Algorithmic efficiency • efficiency of original sequential simulator(s) • efficiency of domain decomposition method • Parallel efficiency • communication overhead (low) • coarse grid correction overhead (normally low) • load balancing • subproblem size • work on subdomain solves
Application • Single-phase groundwater flow • DD as the global solution method • Subdomain solvers use CG+FFT • Fixed number of subdomains M=32 (independent of P) • Straightforward parallelization of an existing simulator P: number of processors
Diffpack • O-O software environment for scientific computation • Rich collection of PDE solution components - portable, flexible, extensible • www.diffpack.com • H.P.Langtangen: Computational Partial Differential Equations, Springer 1999
Straightforward Parallelization • Develop a sequential simulator, without paying attention to parallelism • Follow the Diffpack coding standards • Need Diffpack add-on libraries for parallel computing • Add a few new statements for transformation to a parallel simulator
Linear-Algebra-Level Approach • Parallelize matrix/vector operations • inner-product of two vectors • matrix-vector product • preconditioning - block contribution from subgrids • Easy to use • access to all Diffpack v3.0 CG-like methods, preconditioners and convergence monitors • “hidden” parallelization • need only to add a few lines of new code • arbitrary choice of number of procs at run-time • less flexibility than DD
A Simple Coding Example GridPartAdm* adm;//access to parallelizaion functionality LinEqAdm* lineq;//administrator for linear system & solver // ... #ifdef PARALLEL_CODE adm->scan (menu); adm->prepareSubgrids (); adm->prepareCommunication (); lineq->attachCommAdm (*adm); #endif // ... lineq->solve (); set subdomain list = DEFAULT set global grid = grid1.file set partition-algorithm = METIS set number of overlaps = 0
Single-Phase Groundwater Flow Highly unstructured grid Discontinuity in the coefficientK (0.1 & 1)
Measurements 130,561 degrees of freedom Overlapping subgrids Global BiCGStab using (block) ILU prec.
A Finite Element Navier-Stokes Solver • Operator splitting in the tradition of pressure correction, velocity correction, Helmholtz decomposition • This version is due to Ren & Utnes, 1993
The Algorithm • Calculation of an intermediate velocity in a predictor-corrector way:
The Algorithm • Solution of a Poisson Equation • Correction of the intermediate velocity
Simulation Snapshots Pressure
Simulation Snapshots Velocity
Some CPU Measurements The pressure equation is solved by the CG method with “subdomain-wise” MILU prec.
Combined Approach • Use a CG-like method as basic solver (i.e. use a parallelized Diffpack linear solver) • Use DD as preconditioner (i.e. SimulatorP is invoked as a preconditioning solve) • Combine with coarse grid correction • CG-like method + DD prec. is normally faster than DD as a basic solver
Two-Phase Porous Media Flow Simulation result obtained on 16 processors
Two-phase Porous Media Flow History of saturation for water and oil
Two-Phase Porous Media Flow SEQ: PEQ: BiCGStab + DD prec. for global pressure eq. Multigrid V-cycle in subdomain solves
Nonlinear Water Waves Fully nonlinear 3D water waves Primary unknowns: Parallelization based on an existing sequential Diffpack simulator
Nonlinear Water Waves • CG + DD prec. for global solver • Multigrid V-cycle as subdomain solver • Fixed number of subdomains M=16 (independent of P) • Subgrids from partition of a global 41x41x41 grid
Elasticity • Test case: 2D linear elasticity, 241 x 241 global grid. • Vector equation • Straightforward parallelization based on an existing Diffpack simulator
2D Linear Elasticity • BiCGStab + DD prec. as global solver • Multigrid V-cycle in subdomain solves • I: number of global BiCGStab iterations needed • P: number of processors (P=#subdomains)
Summary • Goal: provide software and programming rules for easy parallelization of sequential simulators • Applicable to a wide range of PDE problems • Two parallelization strategies: • domain decomposition: very flexible, compact visible code/algorithm • parallelization at the linear algebra level: “automatic” hidden parallelization • Performance: satisfactory speed-up
Future Application DD with different PDEs and local solvers • Out in deep sea: Eulerian, finite differences, Boussinesq PDEs, F77 code • Near shore: Lagrangian, finite element, shallow water PDEs, C++ code