Reducing Complexity in Algebraic Multigrid

Reducing Complexity in Algebraic Multigrid Hans De SterckDepartment of Applied MathematicsUniversity of Colorado at BoulderUlrike Meier YangCenter for Applied Scientific ComputingLawrence Livermore National Laboratory

Outline • introduction: AMG • complexity growth when using classical coarsenings • Parallel Modified Independent Set (PMIS) coarsening • scaling results • conclusions and future work

Introduction • solve • from 3D PDE – sparse! • large problems (109 dof) - parallel • unstructured grid problems

Algebraic Multigrid (AMG) • multi-level • iterative • algebraic: suitable forunstructured!

AMG building blocks Setup Phase: • Select coarse “grids” • Defineinterpolation, • Definerestrictionandcoarse-grid operators Solve Phase

AMG complexity - scalability • Operator complexity Cop= e.g., 3D: Cop= 1 + 1/8 + 1/64 + … < 8 / 7 measure of memory use, and work in solve phase • scalable algorithm: O(n) operations per V-cycle (Cop bounded) AND number of V-cycles independent of n (rAMG independent of n)

AMG Interpolation • after relaxation: Ae  0 (relative to e) • heuristic: error after interpolation should also satisfy this relation approximately • derive interpolation from:

AMG interpolation • “large” aij should be taken into account accurately • “strong connections”: i strongly depends on j (and j strongly influences i ) if with strong threshold

AMG coarsening • (C1) Maximal Independent Set: Independent: no two C-points are connected Maximal: if one more C-point is added, the independence is lost • (C2) All F-F connections require connections to a common C-point (for good interpolation) • F-points have to be changed into C-points, to ensure (C2); (C1) is violated more C-points, higher complexity

Classical Coarsenings • Ruge-Stueben (RS) • two passes: 2nd pass to ensure that F-F have common C • disadvantage: highly sequential • CLJP • based on parallel independent set algorithms developed by Luby and later by Jones & Plassman • also ensures that F-F have common C • hybrid RS-CLJP (“Falgout”) • RS in processor interiors, CLJP at interprocessor boundaries

Classical coarsenings: complexity growth • example: hybrid RS-CLJP (Falgout), 7-point finite difference Laplacian in 3D, q = 0.25 • increased memory use, long solution times, long setup times = loss of scalability

our approach to reduce complexity • do not add C points for strong F-F connections that do not have a common C point • less C points, reduced complexity, but worse convergence factors expected • can something be gained?

PMIS coarsening (De Sterck, Yang) • Parallel Modified Independent Set (PMIS) • do not enforce condition (C2) • weighted independent set algorithm: points i that influence many equations (li large), are good candidates for C-points • add random number between 0 and 1 to li to break ties

PMIS coarsening • pick C-points with maximal measures (like in CLJP), then make all their neighbors fine (like in RS) • proceed until all points are either coarse or fine

3.7 5.3 5.0 5.9 5.4 5.3 3.4 5.2 8.0 8.5 8.2 8.6 8.9 5.1 5.9 8.1 8.8 8.9 8.4 8.2 5.9 5.7 8.6 8.3 8.8 8.3 8.1 5.0 5.3 8.7 8.3 8.4 8.3 8.8 5.9 5.0 8.8 8.5 8.6 8.7 8.9 5.3 3.2 5.6 5.8 5.6 5.9 5.9 3.0 PMIS select 1 • select C-pts with maximal measure locally • make neighbor F-pts • remove neighbor edges

PMIS: remove and update 1 3.7 5.3 5.0 5.9 • select C-pts with maximal measure locally • make neighbors F-pts • remove neighbor edges 5.2 8.0 5.9 8.1 5.7 8.6 8.1 5.0 8.4 8.6 5.6

PMIS: select 2 5.9 3.7 5.3 5.0 • select C-pts with maximal measure locally • make neighbors F-pts • remove neighbor edges 5.2 8.0 5.9 8.1 5.7 8.6 8.1 5.0 8.4 8.6 5.6

PMIS: remove and update 2 3.7 5.3 • select C-pts with maximal measure locally • make neighbors F-pts • remove neighbor edges 5.2 8.0

PMIS: final grid • select C-pts with maximal measure locally • make neighbor F-pts • remove neighbor edges

Preliminary results: 7pt 3D Laplacian on an un-structured grid (n = 76,527), serial, q=0.5, GS • Implementation in CASC/LLNL’s Hypre/BoomerAMG library (Falgout, Yang, Henson, Jones, …)

PMIS results: 27-point finite element Laplacian in 3D, 403 dof per proc (IBM Blue) Falgout (q=0.5) and PMIS-GMRES(10) (q=0.25)

PMIS results: 7-point finite difference Laplacian in 3D, 403 dof per proc (IBM Blue) Falgout (q=0.5) and PMIS-GMRES(10) (q=0.25)

Conclusions • PMIS leads to reduced, scalable complexities for large problems on parallel computers • using PMIS-GMRES, large problems can be done efficiently, with good scalability, and without requiring much memory (Blue Gene/L)

Future work • parallel aggressive coarsening, multi-pass interpolation to further reduce complexity (using one-pass RS, PMIS, or hybrid) • improved interpolation formulas for more aggressively coarsened grids (Jacobi improvement, …), to reduce need for GMRES • parallel First-Order System Least-Squares-AMG code for large-scale PDE problems • Blue Gene/L applications

Reducing Complexity in Algebraic Multigrid

Reducing Complexity in Algebraic Multigrid

Presentation Transcript

Multigrid

Multigrid Multidimensional Scaling

Reducing $ and complexity thru data center optimization

Adaptive Algebraic Multigrid

Multigrid CA

Reducing Multi-Valued Algebraic Operations to Binary

Multigrid Methods

Topics 3: Polynomials. Discrete structures. Algebraic complexity. Symbolic-numeric

Algebraic Multigrid AMG

An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering

Applications of Algebraic Multigrid to Large Scale Mechanics Problems

Reducing Reorder Buffer Complexity Through Selective Operand Caching

An Multigrid Tutorial

Reducing Issue Logic Complexity in Superscalar Microprocessors

Geometric (Classical) MultiGrid

Avoiding Synchronization in Geometric Multigrid

Reducing the Complexity of the Register File in Dynamic Superscalar Processors

An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering

Reducing number of operations: The joy of algebraic transformations