1 / 52

A Software Framework for Easy Parallelization of PDE Solvers

Explore a software framework that simplifies parallelization of PDE solvers by implementing domain decomposition. Enhance efficiency and overall numerical performance with this approach, suitable for coarse-grained parallelization. Improves algorithmic efficiency and convergence, with a focus on single-phase groundwater flow.

Download Presentation

A Software Framework for Easy Parallelization of PDE Solvers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo

  2. Outline of the Talk

  3. The Question Starting point: sequential PDE solvers How to do the parallelization? We need • a good parallelization strategy • a good and simple implementation of the strategy Resulting parallel solvers should have • good parallel efficiency • good overall numerical performance

  4. Problem Domain • Partial differential equations • Finite elements/differences • Communication through message passing

  5. A Known Problem “The hope among early domain decomposition workers was that one could write a simple controlling program which would call the old PDE software directly to perform the subdomain solves. This turned out to be unrealistic because most PDE packages are too rigid and inflexible.” - Smith, Bjørstad and Gropp One remedy: Use of object-oriented programming techniques

  6. Domain Decomposition • Solution of the original large problem through iterativelysolving many smaller subproblems • Can be used assolution method or preconditioner • Flexibility -- localized treatment of irregular geometries, singularities etc • Very efficient numerical methods -- even on sequential computers • Suitable for coarse grained parallelization

  7. Overlapping DD Alternating Schwarz method for two subdomains Example: solving an elliptic boundary value problem in A sequence of approximations where

  8. Additive Schwarz Method • Subproblems can be solved in parallel • Subproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundaries

  9. Convergence of the Solution Single-phase groundwater flow

  10. Coarse Grid Correction • This DD algorithm is a kind of block Jacobi iteration • Problem: often (very) slow convergence • Remedy: coarse grid correction • A kind of two-grid multigrid algorithm • Coarse grid solve on each processor

  11. Observations • DD is a good parallelization strategy • A program for the original global problem can be reused (modulo B.C.) for each subdomain • Communication of overlapping point values is required • The approach is not PDE-specific • No need for global data • Data distribution implied • Explicit temporal scheme are a special case where no iteration is needed (“exact DD”)

  12. Goals for the Implementation • Reuse sequential solver as subdomain solver • Add DD management and communication as separate modules • Collect common operations in generic library modules • Flexibility and portability • Simplified parallelization process for the end-user

  13. Generic Programming Framework

  14. Administrator Parameters DD algorithm Operations The Administrator • Parameters solution method or preconditioner, max iterations stopping criterion etc • DD algorithm Subdomain solve + coarse grid correction • Operations Matrix-vector product, inner-product etc

  15. Subdomain Simulator seq. solver add-on communication The Subdomain Simulator • Subdomain Simulator -- a generic representation • C++ class hierarchy • Interface of generic member functions

  16. The Communicator • Need functionality for exchanging point values inside the overlapping regions • Build a generic communication module: The communicator • Encapsulation of communication related codes. Hidden concrete communication model. MPI in use, but easy to change

  17. Realization • Object-oriented programming (C++, Java, Python) • Use inheritance • Simplifies modularization • Supports reuse of sequential solver (without touching its source code!)

  18. SubdomainSimulator SubdomainFEMSolver Generic Subdomain Simulators • SubdomainSimulator • abstract interface to all subdomain simulators, as seen by the Administrator • SubdomainFEMSolver • Special case of SubdomainSimulator for finite element-based simulators • These are generic classes, not restricted to specific application areas

  19. SubdomainSimulator SubdomainFEMSolver Administrator Simulator SimulatorP Making the Simulator Parallel class SimulatorP : public SubdomainFEMSolver public Simulator { // … just a small amount of codes virtual void createLocalMatrix () { Simualtor::makeSystem (); } };

  20. Performance • Algorithmic efficiency • efficiency of original sequential simulator(s) • efficiency of domain decomposition method • Parallel efficiency • communication overhead (low) • coarse grid correction overhead (normally low) • load balancing • subproblem size • work on subdomain solves

  21. Summary So Far • A generic approach • Works if the DD algorithm works for the problem at hand • Implementation in terms of class hierarchies • The new parallel-specific code, SimulatorP, is very small and simple to write

  22. Application • Single-phase groundwater flow • DD as the global solution method • Subdomain solvers use CG+FFT • Fixed number of subdomains M=32 (independent of P) • Straightforward parallelization of an existing simulator P: number of processors

  23. Two-phase Porous Media Flow SEQ: PEQ: DD as preconditioner for global BiCGtab solving pressure eq. Multigrid V-cycle in subdomain solves

  24. Two-Phase Porous Media Flow Simulation result obtained on 16 processors

  25. Two-phase Porous Media Flow History of saturation for water and oil

  26. Nonlinear Water Waves Fully nonlinear 3D water waves Primary unknowns: Parallelization based on an existing sequential Diffpack simulator

  27. Nonlinear Water Waves • DD as preconditioner for global CG solving Laplace eq. • Multigrid V-cycle as subdomain solver • Fixed number of subdomains M=16 (independent of P) • Subgrids from partition of a global 41x41x41 grid

  28. Nonlinear Water Waves 3D Poisson equation in water wave simulation

  29. Application • Test case: 2D linear elasticity, 241 x 241 global grid. • Vector equation • Straightforward parallelization based on an existing Diffpack simulator

  30. 2D Linear Elasticity

  31. 2D Linear Elasticity • DD as preconditioner for a global BiCGStab method • Multigrid V-cycle in subdomain solves • I: number of global BiCGStab iterations needed • P: number of processors (P=#subdomains)

  32. Diffpack • O-O software environment for scientific computation • Rich collection of PDE solution components - portable, flexible, extensible • www.diffpack.com • H.P.Langtangen, Computational Partial Differential Equations, Springer 1999

  33. Straightforward Parallelization • Develop a sequential simulator, without paying attention to parallelism • Follow the Diffpack coding standards • Use add-on libraries for parallelization specific functionalities • Add a few new statements for transformation to a parallel simulator

  34. Linear-algebra-level Approach • Parallelize matrix/vector operations • inner-product of two vectors • matrix-vector product • preconditioning - block contribution from subgrids • Easy to use • access to all existing Diffpack iterative methods, preconditioners and convergence monitors • “hidden” parallelization • need only to add a few lines of new code • arbitrary choice of number of procs at run-time • less flexibility than DD

  35. New Library Tool • class GridPartAdm • Generate overlapping or non-overlapping subgrids • Prepare communication patterns • Update global values • matvec, innerProd, norm

  36. Mesh Partition Example

  37. A Simple Coding Example Handle(GridPartAdm) adm; //access to parallelizaion functionalities Handle(LinEqAdm) lineq; //administrator for linear system & solver // ... #ifdef PARALLEL_CODE adm->scan (menu); adm->prepareSubgrids (); adm->prepareCommunication (); lineq->attachCommAdm (*adm); #endif // ... lineq->solve (); set subdomain list = DEFAULT set global grid = grid1.file set partition-algorithm = METIS set number of overlaps = 0

  38. Single-phase Groundwater Flow Highly unstructured grid Discontinuity in the coefficient K

  39. Measurements 130,561 degrees of freedom Overlapping subgrids Global BiCGStab using (block) ILU prec.

  40. A Fast FEM N-S Solver • Operator splitting in the tradition of pressure correction, velocity correction, Helmholtz decomposition • This version is due to Ren & Utnes

  41. A Fast FEM N-S Solver • Calculation of an intermediate velocity

  42. A Fast FEM N-S Solver • Solution of a Poisson Equation • Correction of the intermediate velocity

  43. Test Case: Vortex-Shedding

  44. Simulation Snapshots Pressure

  45. Simulation Snapshots Pressure

  46. Animated Pressure Field

  47. Simulation Snapshots Velocity

  48. Simulation Snapshots Velocity

  49. Animated Velocity Field

  50. Some CPU-Measurements The pressure equation is solved by the CG method

More Related