Multilevel Preconditioning Package version 3.1

Multilevel Preconditioning Packageversion 3.1 Trilinos Users Group Meeting November ’04 Marzio Sala Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.

Outline • Basic concepts of multigrid/multilevel methods • Configuring and building ML • Using ML • Interoperability with Trilinos packages • Example of usage • Numerical results • Continuing Research • Documentation, mailing lists, and getting help

ML Package • Provides parallel multilevel preconditioning for linear solver methods • Developers: • Ray Tuminaro, Jonathan Hu, Marzio Sala, Micheal Gee • Main methods • Geometric: • Grid refinement hierarchy • 2-level FE basis function domain decomposition • Algebraic (smoothed aggregation) • Classical smoothed aggregation • Extensions for non-symmetric systems • Edge-element for Maxwell’s equations

Multilevel Preconditioners • We are interesting in solving A x = b, A  nxn, x, b  n arising from FE discretizations on 2D/3D unstructured grids • The linear system is solved using Krylov accelerators (e.g., CG or GMRES) • A preconditioner is required: • Applicable in massively parallel environments • For large scale computations • Multilevel approaches furnish a very effective way to define scalable preconditioners

Multilevel Preconditioners (2) Main idea: • Define a set of approximate solutions, on coarser “grids” • Need a projection (P) and a restriction (R) to move from one “grid” to the finer/coarser “grid” • On each “grid”, approximate solvers (smoothers) will dump the high frequencies of the error • On the coarser “grid”, a direct solver is used Two families of multilevel methods: geometric and algebraic methods

Geometric Multigrid • Need to define a hierarchy of grids (and the corresponding restriction/prolongation operators) • 1D example: • However, for unstructured grids on complex geometries, the definition of the coarse grids can be problematic: • Computation of restriction/prolongator may be computational expensive for completely uncorrelated grids • Difficult to define boundary conditions on coarser grids

Algebraic Multigrid Same Goal: Solve Ax = b • Build multilevel operators (Ak, Pk, Rk, Sk’s) automatically to define a hierarchy: Akxk=bk, k=1,…,L • Once Pk’s are defined, the rest follows “easily”: • Rk = PkT (usually) • Ak-1 = Rk-1 Ak Pk-1 (triple matrix product) • Smoother (iterative method) Sk • Gauss-Seidel, polynomial, conjugate gradient, etc. • Smoothed aggreation (SA) algebraic multigrid • interpolation operators Pk’s are built automatically by aggregation of fine grid nodes • also main difficulty

Smoothed Aggregation • Builds P by creating aggregates (set of contiguous nodes) • Works with structured and unstructured grids • No need for nodal coordinates, based on matrix entries (graph) only

Multilevel (V) Cycle function xk = MLV(Ak,rk) if (k == 1) xk = Ak-1 rk else xk ← Spre,k(Ak,rk,0); qk ← rk – Ak xk rk-1 ← Rk qk xk-1 ← MLV(Ak-1,rk-1) xk ← xk + Pk xk-1 xk ← Spost,k(Ak,qk,xk); end ⇦ Direct (parallel) solver ⇦ presmoothing ⇦ Usually, RK = PKT ⇦ Ak-1 = Rk Ak Pk ⇦ postsmoothing

Overview of ML capabilities

Common Decisions for ML Users

Configuring and Building ML • Builds by default when Trilinos is configured/built • Configure help $ Trilinos/packages/ml/configure --help • ML interfaces with a variety of packages • within Trilinos • outside Trilinos • Check out the supported packages you may need!

Configuring and Building ML (2)

Matrix Format • ML is not based on any matrix format, you just need • matvec() • getrow() • returns the nonzero indices and values for any local row • Wrappers exist for • Aztec matrices (DMSR, DVBR) • Epetra matrices • Any ML object can be (cheaply) wrapped as an Epetra object and vice-versa

ML Aggregation Choices • MIS • Expensive, usually best with many processors • Uncoupled (UC) • Cheap, usually works well. • Hybrid = UC + MIS • Uncoupled on fine grids, MIS on coarser grids • Local graph partitioning • At least one aggregate per processor • Requires METIS • Global graph partitioning • Aggregates can span processors • Requires ParMETIS Fixed Aggregate size Variable Aggregate size

ML Aggregation Choices (2) Small aggregates (Uncoupled/MIS) Large aggregates (METIS/ParMETIS)

ML Aggregation Choices: METIS Fine level mesh: 32 x 32 nodes 16 aggregates 4 Aggregates on second level

ML Aggregation Choices: Uncoupled/MIS • Uncoupled/MIS aggregation (Greedy algorithm) • parallel can be complicated

ML Smoother Choices • A smoother is an approximate solver that effectively dumps the high-frequencies of the error After smoothing Before smoothing

ML Smoother Choices (2) • Jacobi • Simplest, cheapest, usually least effective. • Damping parameter () needed • Point Gauss-Seidel • Equation satisfied one unknown at a time • Can be problematic in parallel (may need damping) • processor-based (stale off-proc values) • Block Gauss-Seidel • Satisfy several equations simultaneously by modifying several DOFs (inverting subblock associated with DOFs). • Blocks can correspond to aggregates

ML Smoother Choices (3) • AztecOO • Any Aztec preconditioner can be used as smoother. • Most interesting are ILU & ILUT: may be more robust than Gauss-Seidel • IFPACK: • Block Jacobi and (symmetric) Gauss-Seidel • Incomplete factorizations • Exact LU solvers (through Amesos) • Hiptmair • Specialized 2-stage smoother for Maxwell’s Eqns. • MLS • Approximation to inverse based on Chebyshev polynomials of smoothed operator. • In serial, competitive with true Gauss-Seidel • Doesn’t degrade with # processors (unlike processor-based GS)

Coarsest-level Solvers • Any smoother can be used at the coarse level… • … but a direct method should be preferred: • Incomplete factorizations (via Aztec) • Serial or distributed SuperLU • Amesos interface: • LAPACK, KLU, UMFPACK, SuperLU • SuperLU_DIST, MUMPS • Using Amesos with SuperLU_DIST or MUMPS, one can specify the number of processors

ML Cycling Choices W Cycle FMV Cycle • “Multigrid” options: • V is default (usually works best) • W more expensive, may be more robust • Full MG (V cycle) more expensive (less conventional within preconditioners) • “Domain decomposition” options (for 2-level methods): • Additive methods (requires no matvec in the preconditioning step and only one application of the smoother) • Various hybrid methods (generally more effective, but more expensive)

Multilevel Methods: Issues to be aware of • Severely stretched grids or anisotropies • Loss of diagonal dominance • Atypical stencils • Jumps in material properties • Non-symmetric matrices • Boundary conditions • Systems of PDEs • Non-trivial null space (Maxwell’s equations, elasticity)

What can go wrong? • Small aggregates  high complexity • Large aggregates  poor convergence • Different size aggregates  both • Try different aggregation methods, drop tolerances • Stretched grids  poor convergence • Variable regions  poor convergence • Try different smoothers, drop tolerances • Ineffective smoothing  poor convergence (perhaps due to non-diagonal dominance or non-symmetry in operator) • Try different smoothers

Things to Try if Multigrid Isn’t Working • Smoothers • Vary number of smoothing steps. • More `robust’ smoothers: block Gauss-Seidel, ILU, MLS polynomial (especially if degradation in parallel). • Vary damping parameters (smaller is more conservative). • Try different aggregation schemes • Try fewer levels • Try drop tolerances, if … • high complexity (printed out by ML). • Severely stretched grids • Anisotropic problems • Variable regions • Reduce prolongator damping parameter • Concerned about operators properties (e.g. highly non-symmetric).

Analyzing ML matrices *** Analysis of ML_Operator `A matrix level 0' *** Number of global rows = 256 Number of equations = 1 Number of stored elements = 1216 Number of nonzero elements = 1216 Mininum number of nonzero elements/row = 3 Maximum number of nonzero elements/row = 5 Average number of nonzero elements/rows = 4.750000 Nonzero elements in strict lower part = 480 Nonzero elements in strict upper part = 480 Max |i-j|, a(i,j) != 0 = 16 Number of diagonally dominant rows = 86 (= 33.59%) Number of weakly diagonally dominant rows = 67 (= 26.17%) Number of Dirichlet rows = 0 (= 0.00%) ||A||_F = 244.066240 Min_{i,j} ( a(i,j) ) = -14.950987 Max_{i,j} ( a(i,j) ) = 15.208792 Min_{i,j} ( abs(a(i,j)) ) = 0.002890 Max_{i,j} ( abs(a(i,j)) ) = 15.208792 Min_i ( abs(a(i,i)) ) = 2.004640 Max_i ( abs(a(i,i)) ) = 15.208792 Min_i ( \sum_{j!=i} abs(a(i,j)) ) = 2.004640 Max_i ( \sum_{j!=i} abs(a(i,j)) ) = 15.205902 max eig(A) (using power method) = 27.645954 max eig(D^{-1}A) (using power method) = 1.878674 Total time for analysis = 3.147979e-03 (s)

Analyzing ML Preconditioner *** ************************************************ *** *** Analysis of the spectral properties using LAPACK *** *** ************************************************ *** *** Operator = A *** Computing eigenvalues of finest-level matrix *** using LAPACK. This may take some time... *** results are on file `eig_A.m'. min |lambda_i(A)| = 0.155462 max |lambda_i(A)| = 26.3802 spectral condition number = 169.689 *** ************************************************ *** *** Analysis of the spectral properties using LAPACK *** *** ************************************************ *** *** Operator = P^{-1}A *** Computing eigenvalues of ML^{-1}A *** using LAPACK. This may take some time... *** results are on file `eig_PA.m'. min |lambda_i(ML^{-1}A)| = 0.776591 max |lambda_i(ML^{-1}A)| = 1.09315 spectral condition number = 1.40762

ML Interoperability with Trilinos Packages Via Epetra & TSF Accepts user data as Epetra objects Aztecoo TSF Can be wrapped as Epetra_Operator ML Meros TSF interface exists Epetra Amesos IFPACK Anasazi Direct solvers, Smoothers, eigensolver Other solvers Accepts other solvers and MatVecs Other matvecs

ML with Epetra Two classes wrap ML preconditioners as Epetra preconditioners: • ML_Epetra::MultiLevelPreconditioner • A very easy, black-box interface to ML • Can be used as AztecOO preconditioner • All parameters specified using ParameterList • Still not ready for Maxwell users • ML_Epetra::MultilLevelOperator • Users need to build the preconditioner step-by-step • Ready for Maxwell users

ML with Epetra: Example of use List.set(“max levels”, 4); List.set(“smoother (level 0)”, “Jacobi”); List.set(“damping factor”, 0.5); List.set(“smoother (level 1)”, “IFPACK”); List.set(“coarse: type”, “Amesos-MUMPS”); List.set(“coarse: max procs”, 4); List.set(“output level”, 10); Trilinos_Util_CrsMatrixGallery Gallery(“laplace_3d", Comm); Gallery.Set("problem_size", 100*100*100); // linear system matrix & linear problem Epetra_RowMatrix * A = Gallery.GetMatrix(); Epetra_LinearProblem * Problem = Gallery.GetLinearProblem(); // Construct outer solver object AztecOO solver(*Problem); solver.SetAztecOption(AZ_solver, AZ_cg); // Set up multilevel precond. with smoothed aggr. defaults ParameterList MLList; // parameter list for ML options ML_Epetra::SetDefaults(“SA”,MLList); MLList.set(“aggregation: type”, “METIS”); // create preconditioner ML_Epetra::MultiLevelPreconditioner * MLPrec = new ML_Epetra::MultiLevelPreconditioner(*A, MLList, true); solver.SetPrecOperator(MLPrec); // set preconditioner solver.Iterate(500, 1e-12); // iterate at most 500 times delete MLPrec; Triutils AztecOO ML AztecOO Trilinos/packages/ml/examples/ml_example_MultiLevelPreconditioner.cpp

Numerical Experiments: MPSalsa • 3D Thermal convection in a cube • Incompressible Navier-Stokes + energy • Smoother: processor-based ILU • Coarse solver: SuperLU • Sandia ASCI Red machine

Numerical Experiment: Zpinch simulation CG preconditioned with V(1,1) AMG cycle ||r||2/||r0||2 < 10-8 2-stage Hiptmair smoother (4th order polynomials in each stage) Edge element interpolation Platform : Sandia’s Cplant

On-going Research • Smoothed aggregation for non-symmetric systems • Aggregation schemes for stretched elements • New smoothers (via IFPACK) • Compatible relaxation for smoothed aggregation • Adaptive smoothed aggregation • Starts with existing AMG method • Iteratively detects slow-to-converge modes (e.g., rigid body modes) • Goal: Improve prolongator during simulation • Variable-block aggregation • Nonlinear preconditioning

Getting Help • See Trilinos/packages/ml/examples • See guide: • ML User’s Guide, ver. 3.1 (in ml/doc/mlguide.ps) • ML Developer’s Guide, ver. 3.1 (in ml/doc/DevelopersGuide) • Mailing lists • ml-users@software.sandia.gov • ml-announce@software.sandia.gov • Bug reporting, enhancement requests via bugzilla: • http://software.sandia.gov/bugzilla • Email us directly • jhu@sandia.gov • rstumin@sandia.gov • msala@sandia.gov • mwgee@sandia.gov

Multilevel Preconditioning Package version 3.1

Multilevel Preconditioning Package version 3.1

Presentation Transcript

Ischemic Preconditioning (IPC)

Comparisons of Global Retrieved Emissivity for Version 3.1 and Version 3.5

Myocardial preconditioning

Replacing Steam Preconditioning

IFPACK: Robust Algebraic Preconditioning Package

Lattice3D Reporter Tutorial For Version 3.1 and later

ML: Multilevel Preconditioning Package

Modeling Drop Structures in HEC-RAS Version 3.1

Ischemic Preconditioning (IPC)

MLD2P4: a package of parallel algebraic multilevel Preconditioners

HEC-RAS Version 3.1 Tributary Junctions

3.1 version 3

Version 3.1

Preconditioning in Expectation

ML: A Multilevel Preconditioning Package

Modeling Inline Structures using HEC-RAS Version 3.1

HEC-RAS Version 3.1 Troubleshooting

HEC-RAS Version 3.1 Tributary Junctions