460 likes | 536 Views
Separation of scales with fast rotation and weak stratification and some ideas for exascale computing Beth Wingate - Los Alamos National Laboratory Supported by DOE ASCR & BER. Outline. Mesoscale dynamics in the ocean – deformation radius
E N D
Separation of scales with fast rotation and weak stratificationand some ideas for exascale computingBeth Wingate - Los Alamos National LaboratorySupported by DOE ASCR & BER
Outline • Mesoscale dynamics in the ocean – deformation radius - Ocean grid spacing relative to the deformation radius for eddy resolving versus climate scales - Charney and the separation of time scales - Time separation in today’s hydrostatic ocean codes • A new result for strong rotation and weak stratification - New nonhydrostatic slow equations - Some numerical results - Some observations • How are these ideas related to next generation ocean models?
Rossby Radius of deformation We’d like to resolve the mesoscale features of the ocean • What is mesoscale dynamics? • Climate simulations • low resolution: 1 deg (100 km) • long duration: 100s of years • fully coupled to atmosphere, etc. • Eddy-resolving sim. • high resolution: 0.1 deg (10 km) • short duration: 50-100 years • ocean only Potential temperature 0.1º x 0.1º grid Potential temperature 0.8º x 0.8º grid
The baroclinic instability occurs near the deformation radius Kinetic energy Surface temperature POP POP POP resolution: 0.8 (low) 0.4 0.2 (high) Eddy Kinetic energy low resolution-0.8º standard POP high resolution-0.1º standard POP POP POP Potential temperature - vertical cross section POP resolution: 0.8 (low) 0.4 0.2 (high) Depth of 6C isotherm high res. depth low resolution-0.8º standard POP high resolution-0.1º standard POP med. res. S N S N low res.
The Rossby Deformation Radius • The length scale where the earth’s tendency to deform the ocean is balanced by gravity’s tendency to flatten it. The length scale where a current ‘deforms’ due to rotation. • Shallow water • 3D N is the buoyancy frequency, f is the rotation frequency, H a vertical scale height
Deformation radius versus grid spacing Smith, Maltrud, Bryan, Hecht, 2000
Separation of Time Scales and Charney • By using measurements of typical length and time scales Charney, based on a paper he published about stability the year before, derived the Quasi-Geostrophic equations which, “Filtered out the fast waves” (Charney, Geof Publ, 1948) • He used geostrophic balance and hydrostatic balance. • Extensively used to understand the large scale features of rotating and stratified flow including explaining baroclinic instability. It also is the source of understanding Rossby waves (planetary scale waves). • Important conceptual model
Motivation: IPCC class ocean models – a matter of convergence under grid refinement Boussinesq equations and the hydrostatic primitive equations State of the art IPCC class OGCMs • In Models the flow is Hydrostatic – valid when horizontal grid spacing is much larger than vertical • Important to resolve Deformation radius – gets smaller with latitude • The higher the resolution, the harder it is to meet both these criteria. But if you do not use the hydrostatic approximation you add back in the waves due to dw/dt.
Hydrostatic equations on parallel architectures • Start with U(t) (slow+fast) • Subtract 2D fast component • Use vertical integration • Advance 2D fast dynamics to t+dt_slow • Subcycle by performing many time steps at dt_fast, or • Use a semi-implicit method • Advance 3D slow dynamics to t+dt_slow • Use explicit time stepping • Combine fast and slow dynamics to obtain U(t+dt_slow)
Dynamics at high latitudes • Nearly solid body rotation • Stratified flow, but weaker than stratification elsewhere. Timmermans et al 2010 Image courtesy NOAA, 2007
Low Rossby limiting dynamics in weak stratificationBeth Wingate, Pedro Embid, Miranda Holmes-Cerfon, Mark Taylor Triply periodic rotating and stratified Boussinesq equations Embid and Majda, 1996, 1998 Majda and Embid, 1998 Schochet, 1994 Klainerman and Majda 1981
Consider the linear waves of this system Buoyancy Frequency: Rotation Frequency: Rossby Deformation Radius: Slow:
Conserved Quantities:Total Energy and Potential Enstrophy In the absence of viscosity and diffusivity: where
Embid and Majda, 1996, 1997 Schochet, 1994 Klainerman and Majda 1981 Nonlocal form in a Hilbert Space 2 Hilbert Space X of vector fields u in L that are divergence free and equipped with the L norm. 2
Method of Multiple Scales To avoid secularity the second order term must be smaller than the leading order term.
Abstract form with slow and fast time scales To lowest order
Solve for the fast leading order solutions The order 1 solution is a function of the leading order solution: Solve with Duhammel’s formula and ensure the u1 solution is not large by using the concept of fast wave averaging. Where solves:
By direct computation here is the fast wave averaged equation: Compute the evolution equation for the Fourier amplitudes
The eigenvalues and eigenfunctions of the leading order equations Look at solutions of the linear problem: 0 gyroscopic/inertial waves Notice that there is a dispersive mode with zero frequency.
The slow dynamics evolves independently of the fast Three wave resonances means we must choose k’ and k’’ such that You can show that the interaction coefficients are zero for the fast-fast interaction. Which means that the slow dynamics evolves independently of the fast.
The new slow equations challenge our ideas of fast and slow dynamics in the ocean. They are nonhydrostatic. 2-D 2-D 3-D 2-D
Taylor-Proudman Theory • Taylor Proudman theorem: In rapidly rotating flow, the flow two-dimensionalizes.
Conservation of the ratio of slow to fast energy: Using the same arguments as Embid and Majda, 1997, there is conservation in time of the slow to fast energy ratio. This means the total energy conservation is composed of both slow and fast dynamics. But the leading order potential enstrophy is composed only of the slow dynamics.
High resolution numerical simulations White noise forcing. Smith and Waleffe, JFM, 2002 Hyperviscosity: Random Forcing:
Ro = .2 White noise forcing wave number 3 Ro = 1 “Low Rossby limiting dynamics for stably stratified flows”, B. Wingate, P. Embid, M. Holmes-Cerfon, M. Taylor, To Appear Journal of Fluid Mechanics, 2010
Observations in the Arctic BasinMary Louise Timmermans, Yale University Beaufort Gyre, Site B Funded by the Beaufort Gyre Exploration Program - National Science Foundation
Theme Area? Mooring data: - deep mesoscale vortices depth-time section of speed depth-time section of potential temperature with isopycnals overlain
What I’m thinking about for nonhydrostatic ocean and exascale computing What is fast in most of the ocean (hydrostatic) is slow in areas of weak stratification. Meaning the ocean itself wanders in and out of the parameter space, touching regions where there are separations of time scales but not necessarily QG. If we are to solve the nonhydrostatic equations for the SLOW mesoscale but step over the fast waves, how do we do it?
DOE POV: Power is limiting factor 200 MW usual scaling goal A city of 80,000 people is estimated to use 45 MW of power per year. Exascale design is for 1 exaflop to cost 20 MW instead of 200MW. 2005 2010 2015 2020
DOE POV: Off chip data movement costs more than FLOPS Intranode SMP Intranode MPI Flop
Scientist POV: My life as a Co-Designer Example of the programming model for using CPU + GPU CPU GPU Chipset DRAM PCIexpress bus DRAM HOST DEVICE Data parallel pieces of an algorithm are executed on the device as KERNELS that are executed as many THREADS grouped together in BLOCKS. BLOCKS grouped together in GRIDS. Each concept has hardware meaning.
Scientist POV: My life as a Co-Designer • The key to getting performance through the use of co-processors (such as the GPU) is the idea of Data-Parallel Computing • Identical operations executed on many data elements in parallel (simplified flow control allows increased ratio of compute logic to control logic) a = 2 * pi + x b = y * z a = 2 * pi + x b = y * a DATA INDEPENDENT DATA DEPENDENT
Scientist POV: My life as a Co-Designer: Example of the programming model for using CPU + GPU main program { …… Logic to execute ocean model dynamics …… Call to ship data in CPU memory to GPU memory ColumnPhysicsKernal<<<dim3 GridSize, dim3 BlockSize>>>(…) …… Call to ship GPU memory back to CPU memory } __global__ ColumnPhysicsKernal(float *x, int *b, …) { Int id = blockIDx.x*blockDIM.x+threadID.x A(id) = sin(x(id)) }
Scientist POV: My life as a Co-Designer My general remarks • Granularity – measure of the ratio of computation in each task to communication in each task • Memory – the device OS is very light weight, but there are different kinds of memory (GLOBAL, LOCAL, CONSTANT, CACHE, TEXTURE) that can be used to optimize performance (you are the compiler). For example TEXTURE MEMORY allows hardware support for operations like filtering – meaning it is Very Fast. (Can we design special chips for on-board elliptic solves or ffts? YES!) • The use of asynchronous data use can be powerful In science, though, sometimes we need to make sure we have all the data before proceeding – then use __synchthreads() • Already exists CUBLAS, CUFFT, CUSPARSE, CURAND (NVIDIA is sending people from its new HPC methods group to LANL) • Need good programming models to optimize: • Parallel efficiency (latency and execution, arithmetic optimization and warps). Think about global memory bandwidth to latency. • Memory Optimization • Profiling
Scientist POV: My life as a Co-Designer More general remarks • On a heterogeneous machine we can have hardware that perform specific operations. We need to ‘reach through’ all the way to chip designers to do this. • Considering the original problem from the math/algorithms point of view through the lens of heterogeneous computing • Our complex climate codes will no longer be dominated by one numerical method. We may use spectral elements for something, finite volumes for another, we may need to use monte carlo to solve elliptic problems. We may need to throw our current algorithms such as the barotropic/baroclinic splitting away entirely and throw compute cycles at the problem with high order methods. We may need to consider if there is a part of the dynamics that can be considered stochastic. • We already do this in the climate model with different methods in the atmosphere, ocean, ice sheets, sea ice. Why not take this further?
Hydrostatic and parallel versus Nonhydrostatic and heterogeneous New solution method for NHBE Current solution method for HPE • Start with U(t) (slow+fast) • Compute 3D slow component (fast-wave (time) averaging & projection) • Subtract 3D slow component • Advance 3D fast dynamics to t+dt_slow • Brute force explicit sub cycling, or • Novel projection/Monte-Carlo (as in Zwanzeig et al), or • Novel projection/PIC on GPU, or • Novel semi-implicit on GPU • Advance 3D slow dynamics to t +dt_slow • Use fast-wave averaging ideas and explicit time stepping • Combine fast and slow dynamics to obtain U(t+dt_slow) • Start with U(t) (slow+fast) • Subtract 2D fast component • Use vertical integration • Advance 2D fast dynamics to t+dt_slow • Subcycle by performing many time steps at dt_fast, or • Use a semi-implicit method • Advance 3D slow dynamics to t+dt_slow • Use explicit time stepping • Combine fast and slow dynamics to obtain U(t+dt_slow)
Summary • Discussed the mesoscale in contemporary ocean models and our approximations. • Showed you a new result that suggests there are slow dynamics in weak stratification and they are nonhydrostatic. • Suggests the nonhydrostatic equations are necessary to predict the mesoscale dynamics. • Importance of conceptual models in the design of next generation numerical ocean models.
Then use the method of fast wave averaging Where solves: And therefore, the solution to leading order is And since the fast linear operator is skew-Hermitian, has an orthogonal decomposition into fast and slow components: Using the fact that , the leading order solution looks like,
A key observation is that the fast operator, , is skew-Hermitian in X:. By the Spectral Theorem, skew-Hermitian operators have purely imaginary eigenvalues and an orthonormal basis of eigenfunctions. Physically this means the basic normal mode solutions of the equations are wave solutions. The null space of , , is orthogonal to the range of because the null space is spanned by the eigenfunctions with zero eigenvalues (slow modes) where the range is spanned by the remaining eigenfunctions with non-zero eigenvalues (fast modes).
Derive the equation for the slow dynamics Knowing the slow dynamics evolves independently of the fast we can find the equations for the slow dynamics by projecting the solution and the equations onto the null space of the fast operator . Then the fast wave averaged equation for the slow modes becomes,
With Conservation Laws where: