Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

1. Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

2. Outline Objective: Give an overview of motivation, and some detail, for the development of high-order methods in scientific applications Introduction Architecture / Algorithm Coupling Accuracy Stability Solvers Applications will be interwoven throughout�

3. Motivation: Spectral Element Simulation Examples transitional boundary layers heat transfer enhancement [M. Greiner (UNR)] convection dynamics in deep atmospheres [A. Deane (U.Md)] pattern formation in nonequilibrium flows [H. Greenside (Duke), M. Paul & M. Cross (Caltech)] vascular flows [F. Loth (UIC), S. Lee (MIT), H. Bassiouny (UofC)] MHD [F. Cattaneo, A. Obabko, R. Rosner (UofC)]

4. Heat Transfer in an fcc Lattice of Spheres at ReD = 30,000 void-centric view Max temperature fluctuation: ~ 14 x mean temperature change. E=1536 elements of order N=14 10 hours on 64 nodes for 1 flow-through time sphere-centric view

5. Transition for Pulsatile Flow in a Model Stenosis Calculation by Sonu Sam Varghese, Steve Frankel, Purdue.

6. Spatial Discretization: Spectral Element Method (Patera 84, Maday & Patera 89) Variational method, similar to FEM, using GL quadrature. Domain partitioned into E high-order quadrilateral (or hexahedral) elements (decomposition may be nonconforming - localized refinement) Trial and test functions represented as N th-order tensor-product polynomials within each element. (N ~ 4 -- 15, typ.) EN 3 gridpoints in 3D, EN 2 gridpoints in 2D. Converges exponentially fast with N for smooth solutions.

7. PN - PN-2 Spectral Element Method for Navier-Stokes (MP 89)

8. Local Spectral Element Basis in 2D

9. Architecture / Algorithm Considerations

10. Ubiquitous Loosely-Coupled Network of Processor/Memory Units Paralellism � easy: SP = h P S1 h = O(1) �even for P? 10,000. Concurrency Reasonable communication requirements Difficulty is getting good single processor performance: S1 Moore�s Law: CPU speed doubles every 18 months

11. Moore�s Law: Measured Performance: 1986�2000

12. Ubiquitous Loosely-Coupled Network of Processor/Memory Units Paralellism � easy: SP = h P S1 h = O(1) �even for P? 10,000. Concurrency Reasonable communication requirements Difficulty is getting good single processor performance: Moore�s Law: CPU speed doubles every 18 months Memory speeds � double every 120 months ? Growing processor-memory gap. On current machines, 10% of peak is considered reasonable.

13. Ubiquitous Loosely-Coupled Network of Processor/Memory Units Paralellism � easy: SP = h P S1 h = O(1) �even for P? 10,000. Concurrency Reasonable communication requirements Difficulty is getting good single processor performance: Moore�s Law: CPU speed doubles every 18 months Memory speeds � double every 120 months ? Growing processor-memory gap. On current machines, 10% of peak is considered reasonable. Need algorithms that re-use data, i.e. FLOPS >> # memory accesses

14. Work vs. Memory Examples #ops #mem ratio Vector-Vector: a = a + c b 2n 2n 1 Matrix-Vector: x = Ay 2n2 n2 2 Sparse Mat-Vec: x = Asy 2mn mn 2 Matrix-Matrix: W = AU 2n3 2n2 n Amortize cost of accessing A over n vectors

15. Matrix-Matrix Product Performance on Cached-Based Architectures Time for A*B vs A+B for N=10 is: 2.0 x on DEC Alpha, 1.5 x on IBM SP

16. Examples of data re-use via matrix-matrix products in PDE operator evaluation (kernel of iterative solvers)�

17. Example 1: Spectral Element Derivative Evaluation Local tensor-product form (2D), allows derivative to be evaluated as matrix-matrix product: In Rd: (N+1)d points/element, O(Nd+1) ops/element Similar complexities hold for all SE operations

18. Example 2: Derivative Evaluation for p-type or DG Triangles (Teng, Hesthaven, Warburton,�) Standard: uxe = Dxe ue , e = 1,�, E O(E p4 ) memory references for O(E p2) gridpoints Modified: uxe = rxe Dre ue + sxe Dse ue For triangles, metrics rxe and sxe are constants Dre ue , Dse ue are matrix-matrix products on canonical triangle O(E p2 ) memory references for O(E p2) gridpoints In 3D, savings is more significant

19. Example 3: Block-Jacobi Preconditioning for Triangles / Tets

20. Summary: Architectures It is possible to construct high-order methods having memory-access demands identical to standard low-order schemes. The extra work of the high-order methods can be (potentially) covered by the processor-memory performance gap.

21. Motivation for High-Order Accuracy

22. High-Order Methods for the Incompressible Navier-Stokes Equations Reynolds number Re ~ 1000 - 2000 small amount of diffusion highly nonlinear (small scale structures result) High-order methods yield savings for simulations featuring long time-integration minimal physical dissipation (Re>>1, no turbulence modeling) Examples DNS (direct numerical simulation) LES (large eddy simulation) Chaotic systems

23. Exponential Convergence

24. Excellent transport properties, even for non-smooth solutions

25. Long-time integration: 1-D periodic convection

26. Relative Phase Error for h vs. p Refinement: ut + ux = 0 h-refinement p-refinement Fraction of accurately resolved modes is increased only through increasing order ( N or p )

27. Example Applications of High-Order Methods

28. Scattered Field Solution of Maxwell�s Equations Calculation by Jan Hesthaven & Tim Warburton � USEMe Computed in the time domain Discontinuous Galerkin in space: 250,000 tets, p=4 jan.hesthaven@brown.edu timwar@math.umn.edu

29. Anisotropic Diffusion in Plasma Fusion Reactors For the next generation fusion codes, the DOE Center for Extended Magnetohydrodynamics Modeling (CEMM) at PPPL has posed three challenging model problems to test new discretizations. As part of SciDAC funded research, we have tackled the first of these.

30. Anisotropic Diffusion in a Tokamak (PPPL Challenge Problem) k || = 109 k =1

31. Anisotropic Diffusion in Plasma Fusion Reactors Because of the high-degree of anisotropy, this problem is more like a (difficult) hyperbolic problem than straightforward diffusion.

32. Stability of High-Order Methods

33. Stability Considerations Stability for nodal bases essentially comes from use of Gauss-type nodal distributions � avoids Runge phenomenon This has been generalized to tets / triangles by Hesthaven � points based on steady-state charge distributions Wingate & Taylor � Fekete points that minimize det Vandermonde matrix Other stability issues also arise, because there is little numerical or physical dissipation in the numerical systems�. Problem usually stems from spurious eigenvalues and can be controlled by: Least-Squares Upwinding Bubble functions Hyperviscosity or spectrally vanishing viscosity Filtering

34. Filter-Based Stabilization At end of each time step, Interpolate u onto GLL points for PN-1 Interpolate back to GLL points for PN Results are smoother with linear combination: (F. & Mullen 01) Preserves interelement continuity and spectral accuracy Equivalent to multiplying by (1-a) the N th coefficient in the expansion u(x) = S uk fk (x) (Boyd 98) fk (x) := Lk(x) - Lk-2(x)

35. Numerical Stability Test: Shear Layer Roll-Up (Bell et al. JCP 89, Brown & Minion, JCP 95, F. & Mullen, CRAS 2001)

36. Error in Predicted Growth Rate for Orr-Sommerfeld Problem at Re=7500

37. Filtering permits Red99 > 700 for transitional boundary layer calculations

38. Navier-Stokes Time Advancement: Iterative Linear Solvers

39. Navier-Stokes Time Advancement Nonlinear term: explicit k th-order backward difference formula / extrapolation characteristics (Pironneau �82, MPR �90) Stokes problem: pressure/viscous decoupling: 3 Helmholtz solves for velocity (consistent) Poisson equation for pressure (computationally dominant)

40. PN - PN-2 Spectral Element System Matrices for Stokes (MP 89)

41. Consistent Splitting for Unsteady Stokes(MPR 90, Blair-Perot 93, Couzy 95) E - consistent Poisson operator for pressure, SPD Stiffest substep in N-S time advancement Most compute-intensive phase

42. Pressure Solution Strategy: Epn = gn Projection: compute best approximation from previous time steps Set p* = S gi p n-i, projection onto L previous time steps Choose gi such that || Dp ||E = || p n - p* ||E is minimized Cost: 2 matvecs in E per timestep, L+2 vectors of storage Multilevel preconditioned CG or GMRES to solve E Dp = gn - E p*

43. Initial guess for Axn = bn via projection ( A=E, SPD)

44. Initial guess for Epn = gn via projection onto previous solutions || pn - p*||E = O(Dtl) + O( etol ) (Chan & Wan, 95) Results with/without projection (1.6 million pressure nodes): Similar results for pulsatile carotid artery simulations � 108-fold reduction in initial residual

45. Pressure Solution Strategy: Epn = gn Projection: compute best approximation from previous time steps Multilevel preconditioned CG or GMRES to solve E Dp = gn - E p*

46. Two-Level Overlapping Additive Schwarz Preconditioner for Pressure Solve

47. Overlapping Schwarz + SEM: Fast Local Solvers Exploit local tensor-product structure -- A is separable in canonical domain Fast diagonalization method (FDM) (Lynch et al 64) Local solve cost is ~ 4d K N(d+1)

48. Convergence Rates Schwarz theory says that convergence rate depends only on d/H (H=element size), and not on number of elements or local order N. In practice, however, d ~ N-2 Recent work (Lottes & F. 04a, 04b) indicates that N-dependence is diminished if AC-1 is replaced with a multigrid V-cycle (at nominal cost) using weighted additive Schwarz smoothing. Convergence rate is strongly dependent on element geometry � particularly aspect-ratio.

49. Two-Level Additive Schwarz Performance Iteration count deteriorates in presence of high-aspect ratio elements Nonconforming discretizations eliminate unnecessary elements in the far field and result in better conditioned systems.

50. Navier-Stokes Case Study: Blood Flow in Arteriovenous Grafts

51. A Study of Transition in Arterio-Venous Grafts Provides a port for dialysis patients to obtain high flow rates and avoid repeated vessel injury PTFE plastic tubing surgically attached from artery to vein short-circuit of high to low pressure results in very high flow rates�transitional flow Failure often results after 3 months from occlusion (intimal hyperplasia) forming downstream of attachment to vein, where flow is turbulent

52. In Vitro Validation � Transitional Flow, Arteriovenous Graft

54. Spectral Element Simulation of Transitional Flow in an Arteriovenous-Graft Model, Re=2912

55. Experimental/Numerical Comparison at Re=2912

56. Canine In-Vivo Model of AV Graft NIH RO1, PI: Bassiouny

57. Canine In-Vitro Model Now I�d like to tell you about computational simulation we did. First of all, I need to tell you how we got mathematical model. We used CT images to reconstruct the actual graft geometry. Here you see the picture of plastic cast and you see schematic pointing to the different locations showing CT images for those geometry. Now I�d like to tell you about computational simulation we did. First of all, I need to tell you how we got mathematical model. We used CT images to reconstruct the actual graft geometry. Here you see the picture of plastic cast and you see schematic pointing to the different locations showing CT images for those geometry.

58. Canine Graft Pulsatile Flow Simulation Sang-Wook Lee, UIC In vivo: turbulence is observed CFD does not indicate turbulence: Frequency peak ~30 Hz, not 200-300 Re too low ~ 1200 Re in arterial side is ~ 2400 points to arterial flow as possible controllable source of turbulence

59. Where is Turbulence Coming From ? Suspected that turbulence is generated at arterial anastomosis and sustained through the graft and into the PVS

60. Simulations of Full Graft Geometry Arterial Anastomosis ? PTFE graft ? Venous Anastomosis

61. Simulations of Full Graft Geometry Arterial Anastomosis ? PTFE graft ? Venous Anastomosis

62. Where is Turbulence Coming From ? Turbulence generated at arterial anastomosis is not sustained through the graft and into the PVS�. Recent detective work by students David Smith and Sang-Wook Lee in the UIC Biofluids Lab points to flow-reversal in the DVS as a probable source of turbulence. This conjecture is supported by data from concurrent numerical simulations (Lee) and in-vitro LDA measurements (Smith).

63. Where is Turbulence Coming From ? Three simulations with steady inlet conditionsat ReG = 1200 100, 85, and 70% flow through PVS 70% case shows strong turbulence, despite the fact that ReDVV = 950

64. Coherent Structures for 70/30 Split @ ReG = 1200

65. Conclusions Highlighted several considerations in the development of high-order methods for transport applications Architectures � processor/memory gap Accuracy � minimal dispersion Stability � filtering Solvers � multilevel Schwarz methods

66. Future Directions Expect to see further development of high-order methods as new physical problems are addressed. Recent areas include Vascular flows Electromagnetics, Hesthaven & Warburton (Brown, UNM) MHD codes: LANL, PPPL, ANL / U of C Compressible aero codes: D. Mavriplis UWY, D. Darmofal�s group MIT Atmospheric models: NCAR Geophysics, etc., �

68. Conclusions Hex-based SEM can be effectively used for transitional flow problems in moderately complex domains Avoid severe mesh distortion � impact on accuracy and conditioning Gordon-Hall generally sufficient to blend local element distortion Sub-parametric mappings and / or over-integration can be important for treating first-order derivative terms (i.e., div u , u.grad u ) PDE-based approach to mesh generation can be employed to generate high-quality meshes (dates back to aero-industry)

69.

73. Vascular Flow Simulations Pulsatile carotid artery simulations: ~ 3 million grid points (K = 3000 elements of order N = 10) 20 hours / 256 processors for single cardiac cycle (temporal scales span ~ 3 decades) Fast turn-around time is required for clinical relevance and for large patient-population studies ? fast solvers are important

74. Challenges in Vascular Flow Simulation Automated mesh creation based on medical image data is needed. Boundary conditions must be deduced from ultrasound data. The correct flow-split must be imposed in bifurcating geometries. Strong turbulence can wreak havoc at outflow boundaries. Modeling issues: Non-Newtonian at certain sites? Role of wall compliance?

75. Performance: pressure solve -- Epn=gn Project solution onto previous solutions (~ 6 orders of magnitude residual reduction for carotid simulations) p* = Sk bk pk Solve remaining perturbed problem with conjugate gradients, preconditioned by the two-level additive Schwarz method of Dryja and Widlund (87).

Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

Presentation Transcript

Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division Oak Ridge National Laboratory raons@ornl

Nagi Rao (Nageswara S.V. Rao) Computer Science and Mathematics Division Oak Ridge National Laboratory raons@ornl

Advanced Photon Source, Argonne National Laboratory

Argonne National Laboratory - Jan 11, 2013

Computer Science and Mathematics Division

Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

David E. Bernholdt Computer Science and Mathematics Division, Oak Ridge National Laboratory and

James “Jeeembo” Kohl Computer Science and Mathematics Division Oak Ridge National Laboratory

John Arrington Argonne National Laboratory

James J. Laidler Chemical Technology Division Argonne National Laboratory

Neena Imam Complex Systems Computer Science and Mathematics Division OAK RIDGE NATIONAL LABORATORY

Peter N. Ostroumov, Physics Division, Argonne National Laboratory

Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

Yuelin Li Accelerator Systems Division Argonne National Laboratory

Thomas C. Schulthess Computer Science and Mathematics Division

Jerry Nolen Physics Division Argonne National Laboratory

Rusty Lusk Mathematics and Computer Science Division Argonne National Laboratory

Sreekanth Pannala Computational Mathematics Computer Science and Mathematics Division

ARGONNE NATIONAL LABORATORY DECEMBER 6, 2005 Argonne, IL

Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

Dr. Jacob Barhen Computer Science and Mathematics Division

John Arrington Argonne National Laboratory