240 likes | 405 Views
Reducing Complexity in Algebraic Multigrid. Hans De Sterck Department of Applied Mathematics University of Colorado at Boulder Ulrike Meier Yang Center for Applied Scientific Computing Lawrence Livermore National Laboratory. Outline. introduction: AMG
E N D
Reducing Complexity in Algebraic Multigrid Hans De SterckDepartment of Applied MathematicsUniversity of Colorado at BoulderUlrike Meier YangCenter for Applied Scientific ComputingLawrence Livermore National Laboratory
Outline • introduction: AMG • complexity growth when using classical coarsenings • Parallel Modified Independent Set (PMIS) coarsening • scaling results • conclusions and future work
Introduction • solve • from 3D PDE – sparse! • large problems (109 dof) - parallel • unstructured grid problems
Algebraic Multigrid (AMG) • multi-level • iterative • algebraic: suitable forunstructured!
AMG building blocks Setup Phase: • Select coarse “grids” • Defineinterpolation, • Definerestrictionandcoarse-grid operators Solve Phase
AMG complexity - scalability • Operator complexity Cop= e.g., 3D: Cop= 1 + 1/8 + 1/64 + … < 8 / 7 measure of memory use, and work in solve phase • scalable algorithm: O(n) operations per V-cycle (Cop bounded) AND number of V-cycles independent of n (rAMG independent of n)
AMG Interpolation • after relaxation: Ae 0 (relative to e) • heuristic: error after interpolation should also satisfy this relation approximately • derive interpolation from:
AMG interpolation • “large” aij should be taken into account accurately • “strong connections”: i strongly depends on j (and j strongly influences i ) if with strong threshold
AMG coarsening • (C1) Maximal Independent Set: Independent: no two C-points are connected Maximal: if one more C-point is added, the independence is lost • (C2) All F-F connections require connections to a common C-point (for good interpolation) • F-points have to be changed into C-points, to ensure (C2); (C1) is violated more C-points, higher complexity
Classical Coarsenings • Ruge-Stueben (RS) • two passes: 2nd pass to ensure that F-F have common C • disadvantage: highly sequential • CLJP • based on parallel independent set algorithms developed by Luby and later by Jones & Plassman • also ensures that F-F have common C • hybrid RS-CLJP (“Falgout”) • RS in processor interiors, CLJP at interprocessor boundaries
Classical coarsenings: complexity growth • example: hybrid RS-CLJP (Falgout), 7-point finite difference Laplacian in 3D, q = 0.25 • increased memory use, long solution times, long setup times = loss of scalability
our approach to reduce complexity • do not add C points for strong F-F connections that do not have a common C point • less C points, reduced complexity, but worse convergence factors expected • can something be gained?
PMIS coarsening (De Sterck, Yang) • Parallel Modified Independent Set (PMIS) • do not enforce condition (C2) • weighted independent set algorithm: points i that influence many equations (li large), are good candidates for C-points • add random number between 0 and 1 to li to break ties
PMIS coarsening • pick C-points with maximal measures (like in CLJP), then make all their neighbors fine (like in RS) • proceed until all points are either coarse or fine
3.7 5.3 5.0 5.9 5.4 5.3 3.4 5.2 8.0 8.5 8.2 8.6 8.9 5.1 5.9 8.1 8.8 8.9 8.4 8.2 5.9 5.7 8.6 8.3 8.8 8.3 8.1 5.0 5.3 8.7 8.3 8.4 8.3 8.8 5.9 5.0 8.8 8.5 8.6 8.7 8.9 5.3 3.2 5.6 5.8 5.6 5.9 5.9 3.0 PMIS select 1 • select C-pts with maximal measure locally • make neighbor F-pts • remove neighbor edges
PMIS: remove and update 1 3.7 5.3 5.0 5.9 • select C-pts with maximal measure locally • make neighbors F-pts • remove neighbor edges 5.2 8.0 5.9 8.1 5.7 8.6 8.1 5.0 8.4 8.6 5.6
PMIS: select 2 5.9 3.7 5.3 5.0 • select C-pts with maximal measure locally • make neighbors F-pts • remove neighbor edges 5.2 8.0 5.9 8.1 5.7 8.6 8.1 5.0 8.4 8.6 5.6
PMIS: remove and update 2 3.7 5.3 • select C-pts with maximal measure locally • make neighbors F-pts • remove neighbor edges 5.2 8.0
PMIS: final grid • select C-pts with maximal measure locally • make neighbor F-pts • remove neighbor edges
Preliminary results: 7pt 3D Laplacian on an un-structured grid (n = 76,527), serial, q=0.5, GS • Implementation in CASC/LLNL’s Hypre/BoomerAMG library (Falgout, Yang, Henson, Jones, …)
PMIS results: 27-point finite element Laplacian in 3D, 403 dof per proc (IBM Blue) Falgout (q=0.5) and PMIS-GMRES(10) (q=0.25)
PMIS results: 7-point finite difference Laplacian in 3D, 403 dof per proc (IBM Blue) Falgout (q=0.5) and PMIS-GMRES(10) (q=0.25)
Conclusions • PMIS leads to reduced, scalable complexities for large problems on parallel computers • using PMIS-GMRES, large problems can be done efficiently, with good scalability, and without requiring much memory (Blue Gene/L)
Future work • parallel aggressive coarsening, multi-pass interpolation to further reduce complexity (using one-pass RS, PMIS, or hybrid) • improved interpolation formulas for more aggressively coarsened grids (Jacobi improvement, …), to reduce need for GMRES • parallel First-Order System Least-Squares-AMG code for large-scale PDE problems • Blue Gene/L applications