1 / 74

Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

Outline. Objective: Give an overview of motivation, and some detail, for the development of high-order methods in scientific applicationsIntroductionArchitecture / Algorithm CouplingAccuracyStabilitySolversApplications will be interwoven throughout. Motivation: Spectral Element Simul

hao
Download Presentation

Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Paul Fischer Mathematics and Computer Science Division Argonne National Laboratory

    2. Outline Objective: Give an overview of motivation, and some detail, for the development of high-order methods in scientific applications Introduction Architecture / Algorithm Coupling Accuracy Stability Solvers Applications will be interwoven throughout…

    3. Motivation: Spectral Element Simulation Examples transitional boundary layers heat transfer enhancement [M. Greiner (UNR)] convection dynamics in deep atmospheres [A. Deane (U.Md)] pattern formation in nonequilibrium flows [H. Greenside (Duke), M. Paul & M. Cross (Caltech)] vascular flows [F. Loth (UIC), S. Lee (MIT), H. Bassiouny (UofC)] MHD [F. Cattaneo, A. Obabko, R. Rosner (UofC)]

    4. Heat Transfer in an fcc Lattice of Spheres at ReD = 30,000 void-centric view Max temperature fluctuation: ~ 14 x mean temperature change. E=1536 elements of order N=14 10 hours on 64 nodes for 1 flow-through time sphere-centric view

    5. Transition for Pulsatile Flow in a Model Stenosis Calculation by Sonu Sam Varghese, Steve Frankel, Purdue.

    6. Spatial Discretization: Spectral Element Method (Patera 84, Maday & Patera 89) Variational method, similar to FEM, using GL quadrature. Domain partitioned into E high-order quadrilateral (or hexahedral) elements (decomposition may be nonconforming - localized refinement) Trial and test functions represented as N th-order tensor-product polynomials within each element. (N ~ 4 -- 15, typ.) EN 3 gridpoints in 3D, EN 2 gridpoints in 2D. Converges exponentially fast with N for smooth solutions.

    7. PN - PN-2 Spectral Element Method for Navier-Stokes (MP 89)

    8. Local Spectral Element Basis in 2D

    9. Architecture / Algorithm Considerations

    10. Ubiquitous Loosely-Coupled Network of Processor/Memory Units Paralellism – easy: SP = h P S1 h = O(1) …even for P? 10,000. Concurrency Reasonable communication requirements Difficulty is getting good single processor performance: S1 Moore’s Law: CPU speed doubles every 18 months

    11. Moore’s Law: Measured Performance: 1986—2000

    12. Ubiquitous Loosely-Coupled Network of Processor/Memory Units Paralellism – easy: SP = h P S1 h = O(1) …even for P? 10,000. Concurrency Reasonable communication requirements Difficulty is getting good single processor performance: Moore’s Law: CPU speed doubles every 18 months Memory speeds – double every 120 months ? Growing processor-memory gap. On current machines, 10% of peak is considered reasonable.

    13. Ubiquitous Loosely-Coupled Network of Processor/Memory Units Paralellism – easy: SP = h P S1 h = O(1) …even for P? 10,000. Concurrency Reasonable communication requirements Difficulty is getting good single processor performance: Moore’s Law: CPU speed doubles every 18 months Memory speeds – double every 120 months ? Growing processor-memory gap. On current machines, 10% of peak is considered reasonable. Need algorithms that re-use data, i.e. FLOPS >> # memory accesses

    14. Work vs. Memory Examples #ops #mem ratio Vector-Vector: a = a + c b 2n 2n 1 Matrix-Vector: x = Ay 2n2 n2 2 Sparse Mat-Vec: x = Asy 2mn mn 2 Matrix-Matrix: W = AU 2n3 2n2 n Amortize cost of accessing A over n vectors

    15. Matrix-Matrix Product Performance on Cached-Based Architectures Time for A*B vs A+B for N=10 is: 2.0 x on DEC Alpha, 1.5 x on IBM SP

    16. Examples of data re-use via matrix-matrix products in PDE operator evaluation (kernel of iterative solvers)…

    17. Example 1: Spectral Element Derivative Evaluation Local tensor-product form (2D), allows derivative to be evaluated as matrix-matrix product: In Rd: (N+1)d points/element, O(Nd+1) ops/element Similar complexities hold for all SE operations

    18. Example 2: Derivative Evaluation for p-type or DG Triangles (Teng, Hesthaven, Warburton,…) Standard: uxe = Dxe ue , e = 1,…, E O(E p4 ) memory references for O(E p2) gridpoints Modified: uxe = rxe Dre ue + sxe Dse ue For triangles, metrics rxe and sxe are constants Dre ue , Dse ue are matrix-matrix products on canonical triangle O(E p2 ) memory references for O(E p2) gridpoints In 3D, savings is more significant

    19. Example 3: Block-Jacobi Preconditioning for Triangles / Tets

    20. Summary: Architectures It is possible to construct high-order methods having memory-access demands identical to standard low-order schemes. The extra work of the high-order methods can be (potentially) covered by the processor-memory performance gap.

    21. Motivation for High-Order Accuracy

    22. High-Order Methods for the Incompressible Navier-Stokes Equations Reynolds number Re ~ 1000 - 2000 small amount of diffusion highly nonlinear (small scale structures result) High-order methods yield savings for simulations featuring long time-integration minimal physical dissipation (Re>>1, no turbulence modeling) Examples DNS (direct numerical simulation) LES (large eddy simulation) Chaotic systems

    23. Exponential Convergence

    24. Excellent transport properties, even for non-smooth solutions

    25. Long-time integration: 1-D periodic convection

    26. Relative Phase Error for h vs. p Refinement: ut + ux = 0 h-refinement p-refinement Fraction of accurately resolved modes is increased only through increasing order ( N or p )

    27. Example Applications of High-Order Methods

    28. Scattered Field Solution of Maxwell’s Equations Calculation by Jan Hesthaven & Tim Warburton – USEMe Computed in the time domain Discontinuous Galerkin in space: 250,000 tets, p=4 jan.hesthaven@brown.edu timwar@math.umn.edu

    29. Anisotropic Diffusion in Plasma Fusion Reactors For the next generation fusion codes, the DOE Center for Extended Magnetohydrodynamics Modeling (CEMM) at PPPL has posed three challenging model problems to test new discretizations. As part of SciDAC funded research, we have tackled the first of these.

    30. Anisotropic Diffusion in a Tokamak (PPPL Challenge Problem) k || = 109 k =1

    31. Anisotropic Diffusion in Plasma Fusion Reactors Because of the high-degree of anisotropy, this problem is more like a (difficult) hyperbolic problem than straightforward diffusion.

    32. Stability of High-Order Methods

    33. Stability Considerations Stability for nodal bases essentially comes from use of Gauss-type nodal distributions – avoids Runge phenomenon This has been generalized to tets / triangles by Hesthaven – points based on steady-state charge distributions Wingate & Taylor – Fekete points that minimize det Vandermonde matrix Other stability issues also arise, because there is little numerical or physical dissipation in the numerical systems…. Problem usually stems from spurious eigenvalues and can be controlled by: Least-Squares Upwinding Bubble functions Hyperviscosity or spectrally vanishing viscosity Filtering

    34. Filter-Based Stabilization At end of each time step, Interpolate u onto GLL points for PN-1 Interpolate back to GLL points for PN Results are smoother with linear combination: (F. & Mullen 01) Preserves interelement continuity and spectral accuracy Equivalent to multiplying by (1-a) the N th coefficient in the expansion u(x) = S uk fk (x) (Boyd 98) fk (x) := Lk(x) - Lk-2(x)

    35. Numerical Stability Test: Shear Layer Roll-Up (Bell et al. JCP 89, Brown & Minion, JCP 95, F. & Mullen, CRAS 2001)

    36. Error in Predicted Growth Rate for Orr-Sommerfeld Problem at Re=7500

    37. Filtering permits Red99 > 700 for transitional boundary layer calculations

    38. Navier-Stokes Time Advancement: Iterative Linear Solvers

    39. Navier-Stokes Time Advancement Nonlinear term: explicit k th-order backward difference formula / extrapolation characteristics (Pironneau ’82, MPR ‘90) Stokes problem: pressure/viscous decoupling: 3 Helmholtz solves for velocity (consistent) Poisson equation for pressure (computationally dominant)

    40. PN - PN-2 Spectral Element System Matrices for Stokes (MP 89)

    41. Consistent Splitting for Unsteady Stokes (MPR 90, Blair-Perot 93, Couzy 95) E - consistent Poisson operator for pressure, SPD Stiffest substep in N-S time advancement Most compute-intensive phase

    42. Pressure Solution Strategy: Epn = gn Projection: compute best approximation from previous time steps Set p* = S gi p n-i, projection onto L previous time steps Choose gi such that || Dp ||E = || p n - p* ||E is minimized Cost: 2 matvecs in E per timestep, L+2 vectors of storage Multilevel preconditioned CG or GMRES to solve E Dp = gn - E p*

    43. Initial guess for Axn = bn via projection ( A=E, SPD)

    44. Initial guess for Epn = gn via projection onto previous solutions || pn - p*||E = O(Dtl) + O( etol ) (Chan & Wan, 95) Results with/without projection (1.6 million pressure nodes): Similar results for pulsatile carotid artery simulations – 108-fold reduction in initial residual

    45. Pressure Solution Strategy: Epn = gn Projection: compute best approximation from previous time steps Multilevel preconditioned CG or GMRES to solve E Dp = gn - E p*

    46. Two-Level Overlapping Additive Schwarz Preconditioner for Pressure Solve

    47. Overlapping Schwarz + SEM: Fast Local Solvers Exploit local tensor-product structure -- A is separable in canonical domain Fast diagonalization method (FDM) (Lynch et al 64) Local solve cost is ~ 4d K N(d+1)

    48. Convergence Rates Schwarz theory says that convergence rate depends only on d/H (H=element size), and not on number of elements or local order N. In practice, however, d ~ N-2 Recent work (Lottes & F. 04a, 04b) indicates that N-dependence is diminished if AC-1 is replaced with a multigrid V-cycle (at nominal cost) using weighted additive Schwarz smoothing. Convergence rate is strongly dependent on element geometry – particularly aspect-ratio.

    49. Two-Level Additive Schwarz Performance Iteration count deteriorates in presence of high-aspect ratio elements Nonconforming discretizations eliminate unnecessary elements in the far field and result in better conditioned systems.

    50. Navier-Stokes Case Study: Blood Flow in Arteriovenous Grafts

    51. A Study of Transition in Arterio-Venous Grafts Provides a port for dialysis patients to obtain high flow rates and avoid repeated vessel injury PTFE plastic tubing surgically attached from artery to vein short-circuit of high to low pressure results in very high flow rates–transitional flow Failure often results after 3 months from occlusion (intimal hyperplasia) forming downstream of attachment to vein, where flow is turbulent

    52. In Vitro Validation – Transitional Flow, Arteriovenous Graft

    54. Spectral Element Simulation of Transitional Flow in an Arteriovenous-Graft Model, Re=2912

    55. Experimental/Numerical Comparison at Re=2912

    56. Canine In-Vivo Model of AV Graft NIH RO1, PI: Bassiouny

    57. Canine In-Vitro Model Now I’d like to tell you about computational simulation we did. First of all, I need to tell you how we got mathematical model. We used CT images to reconstruct the actual graft geometry. Here you see the picture of plastic cast and you see schematic pointing to the different locations showing CT images for those geometry. Now I’d like to tell you about computational simulation we did. First of all, I need to tell you how we got mathematical model. We used CT images to reconstruct the actual graft geometry. Here you see the picture of plastic cast and you see schematic pointing to the different locations showing CT images for those geometry.

    58. Canine Graft Pulsatile Flow Simulation Sang-Wook Lee, UIC In vivo: turbulence is observed CFD does not indicate turbulence: Frequency peak ~30 Hz, not 200-300 Re too low ~ 1200 Re in arterial side is ~ 2400 points to arterial flow as possible controllable source of turbulence

    59. Where is Turbulence Coming From ? Suspected that turbulence is generated at arterial anastomosis and sustained through the graft and into the PVS

    60. Simulations of Full Graft Geometry Arterial Anastomosis ? PTFE graft ? Venous Anastomosis

    61. Simulations of Full Graft Geometry Arterial Anastomosis ? PTFE graft ? Venous Anastomosis

    62. Where is Turbulence Coming From ? Turbulence generated at arterial anastomosis is not sustained through the graft and into the PVS…. Recent detective work by students David Smith and Sang-Wook Lee in the UIC Biofluids Lab points to flow-reversal in the DVS as a probable source of turbulence. This conjecture is supported by data from concurrent numerical simulations (Lee) and in-vitro LDA measurements (Smith).

    63. Where is Turbulence Coming From ? Three simulations with steady inlet conditionsat ReG = 1200 100, 85, and 70% flow through PVS 70% case shows strong turbulence, despite the fact that ReDVV = 950

    64. Coherent Structures for 70/30 Split @ ReG = 1200

    65. Conclusions Highlighted several considerations in the development of high-order methods for transport applications Architectures – processor/memory gap Accuracy – minimal dispersion Stability – filtering Solvers – multilevel Schwarz methods

    66. Future Directions Expect to see further development of high-order methods as new physical problems are addressed. Recent areas include Vascular flows Electromagnetics, Hesthaven & Warburton (Brown, UNM) MHD codes: LANL, PPPL, ANL / U of C Compressible aero codes: D. Mavriplis UWY, D. Darmofal’s group MIT Atmospheric models: NCAR Geophysics, etc., …

    68. Conclusions Hex-based SEM can be effectively used for transitional flow problems in moderately complex domains Avoid severe mesh distortion – impact on accuracy and conditioning Gordon-Hall generally sufficient to blend local element distortion Sub-parametric mappings and / or over-integration can be important for treating first-order derivative terms (i.e., div u , u.grad u ) PDE-based approach to mesh generation can be employed to generate high-quality meshes (dates back to aero-industry)

    69.

    73. Vascular Flow Simulations Pulsatile carotid artery simulations: ~ 3 million grid points (K = 3000 elements of order N = 10) 20 hours / 256 processors for single cardiac cycle (temporal scales span ~ 3 decades) Fast turn-around time is required for clinical relevance and for large patient-population studies ? fast solvers are important

    74. Challenges in Vascular Flow Simulation Automated mesh creation based on medical image data is needed. Boundary conditions must be deduced from ultrasound data. The correct flow-split must be imposed in bifurcating geometries. Strong turbulence can wreak havoc at outflow boundaries. Modeling issues: Non-Newtonian at certain sites? Role of wall compliance?

    75. Performance: pressure solve -- Epn=gn Project solution onto previous solutions (~ 6 orders of magnitude residual reduction for carotid simulations) p* = Sk bk pk Solve remaining perturbed problem with conjugate gradients, preconditioned by the two-level additive Schwarz method of Dryja and Widlund (87).

More Related