1 / 32

Applications of Algebraic Multigrid to Large Scale Mechanics Problems

Applications of Algebraic Multigrid to Large Scale Mechanics Problems. Mark F. Adams 22 October 2004. Outline. Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs Scaled speedup Plain speedup

bowling
Download Presentation

Applications of Algebraic Multigrid to Large Scale Mechanics Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of Algebraic Multigrid to Large Scale Mechanics Problems Mark F. Adams 22 October 2004

  2. Outline • Algebraic multigrid (AMG) introduction • Industrial applications • Micro-FE bone modeling • Olympus Parallel FE framework • Scalability studies on IBM SPs • Scaled speedup • Plain speedup • Nodal performance

  3. smoothing The Multigrid V-cycle Finest Grid Restriction (R) Note: smaller grid First Coarse Grid Prolongation (P=RT) Multigrid smoothing and coarse grid correction (projection)

  4. Multigrid V(n1,n2) - cycle • Function u = MG-V(A,f) • if A is small • u  A-1f • else • u  Sn1(f, u) -- n1 steps of smoother (pre) • rH PT( f – Au ) • uHMG-V(PTAP, rH ) -- recursion (Galerkin) • u  u + PuH • u  Sn2(f, u) -- n2 steps of smoother (post) • Iteration matrix: T = S ( I - P(RAP)-1RA ) S • multiplicative

  5. B P0 Smoothed Aggregation • Coarse grid space & smoother  MG method • Piecewise constant function: “Plain” agg. (P0) • Start with kernel vectors B of operator • eg, 6 RBMs in elasticity • Nodal aggregation • “Smoothed” aggregation: lower energy of functions • One Jacobi iteration: P  ( I -  D-1 A ) P0

  6. Parallel Smoothers • CG/Jacobi: Additive (Requires damping for MG) • Damped by CG (Adams SC1999) • Dot products, non-stationary • Gauss-Seidel: multiplicative (Optimal MG smoother) • Complex communication and computation (Adams SC2001) • Polynomial Smoothers: Additive • Chebyshev is ideal for multigrid smoothers • Chebychev chooses p(y) such that • |1 - p(y) y |= min over interval [* , max] • Estimate of max easy • Use * = max / C (No need for lowest eigenvalue) • C related to rate of grid coarsening

  7. Outline • Algebraic multigrid (AMG) introduction • Industrial applications • Micro-FE bone modeling • Olympus Parallel FE framework • Scalability studies on IBM SPs • Scaled speedup • Plain speedup • Nodal performance

  8. Aircraft carrier • 315,444 vertices • Shell and beam elements (6 DOF per node) • Linear dynamics – transient (time domain) • About 1 min. per solve (rtol=10-6) • 2.4 GHz Pentium 4/Xenon processors • Matrix vector product runs at 254 Mflops

  9. Solve and setup times(26 Sun processors)

  10. Adagio: “BR” tire • ADAGIO: Quasi static solid mechanics app. (Sandia) • Nearly incompressible visco-elasticity (rubber) • Augmented Lagrange formulation w/ Uzawa like update • Contact (impenetrability constraint) • Saddle point solution scheme: • Uzawa like iteration (pressure) w/ contact search • Non-linear CG (with linear constraints and constant pressure) • Preconditioned with • Linear solvers (AMG, FETI, …) • Nodal (Jacobi)

  11. “BR” tire

  12. Displacement history

  13. Outline • Algebraic multigrid (AMG) introduction • Industrial applications • Micro-FE bone modeling • Olympus Parallel FE framework • Scalability studies on IBM SPs • Scaled speedup • Plain speedup • Nodal performance

  14. Cortical bone Trabecular bone Trabecular Bone 5-mm Cube

  15. Methods: FE modeling Mechanical Testing E, yield, ult, etc. 3D image FE mesh Micro-Computed Tomography CT @ 22 m resolution 2.5 mm cube 44 m elements

  16. Outline • Algebraic multigrid (AMG) introduction • Industrial applications • Micro-FE bone modeling • Olympus Parallel FE framework • Scalability studies on IBM SPs • Scaled speedup • Plain speedup • Nodal performance

  17. Computational Architecture Silo DB Silo DB Silo DB Silo DB FE MeshInput File ParMetis Athena Partition to SMPs FE input file(in memory) FE input file(in memory) • Athena: Parallel FE • ParMetis • Parallel Mesh Partitioner (Univerisity of Minnesota) • Prometheus • Multigrid Solver • FEAP • Serial general purpose FE application (University of California) • PETSc • Parallel numerical libraries (Argonne National Labs) ParMetis Athena Athena File File File File FEAP FEAP FEAP FEAP Material Card pFEAP Olympus METIS METIS METIS Prometheus METIS Visit ParMetis PETSc

  18. Outline • Algebraic multigrid (AMG) introduction • Industrial applications • Micro-FE bone modeling • Olympus Parallel FE framework • Scalability studies on IBM SPs • Scaled speedup • Plain speedup • Nodal performance

  19. Scalability • Inexact Newton • CG linear solver • Variable tolerance • Smoothed aggregation AMG preconditioner • Nodal block diagonal smoothers: • 2nd order Chebeshev (add.) • Gauss-Seidel (multiplicative) 80 µm w/o shell

  20. 80 µm w/ shell Vertebral Body With Shell • Large deformation elast. • 6 load steps (3% strain) • Scaled speedup • ~131K dof/processor • 7 to 537 million dof • 4 to 292 nodes • IBM SP Power3 • 15 of 16 procs/node used • Double/Single Colony switch

  21. Computational phases • Mesh setup (per mesh): • Coarse grid construction (aggregation) • Graph processing • Matrix setup (per matrix): • Coarse grid operator construction • Sparse matrix triple product RAP (expensive for S.A.) • Subdomain factorizations • Solve (per RHS): • Matrix vector products (residuals, grid transfer) • Smoothers (Matrix vector products)

  22. Linear solver iterations

  23. 131K dof / proc - Flops/sec/proc .47 Terflops - 4088 processors

  24. End to end times and (in)efficiency components

  25. Sources of scale inefficiencies in solve phase

  26. 164K dof/proc

  27. Bisection bandwidth First try: Flop rates (265K dof/processor) • 265K dof per proc. • IBM switch bug • Bisection bandwidth plateau 64-128 nodes • Solution: • use more processors • Less dof per proc. • Less pressure on switch

  28. Outline • Algebraic multigrid (AMG) introduction • Industrial applications • Micro-FE bone modeling • Olympus Parallel FE framework • Scalability studies on IBM SPs • Scaled speedup • Plain speedup • Nodal performance

  29. Speedup with 7.5M dof problem (1 to 128 nodes)

  30. Outline • Algebraic multigrid (AMG) introduction • Industrial applications • Micro-FE bone modeling • Olympus Parallel FE framework • Scalability studies on IBM SPs • Scaled speedup • Plain speedup • Nodal performance

  31. Nodal Performance of IBM SP Power3 and Power4 • IBM power3, 16 processors per node • 375 Mhz, 4 flops per cycle • 16 GB/sec bus (~7.9 GB/sec w/ STREAM bm) • Implies ~1.5 Gflops/s MB peak for Mat-Vec • We get ~1.2 Gflops/s (15 x .08Gflops) • IBM power4, 32 processors per node • 1.3 GHz, 4 flops per cycle • Complex memory architecture

  32. Speedup

More Related