1 / 45

需完成之平行化工作

需完成之平行化工作. 平行化 domain decomposition 之方案確定。 timcom 之 preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) timcom main code with f95 and dynamic allocated memory EVP solver with f95 and dynamic allocated memory Subroutines a2o/o2a with cross cpu core data exchange. 或

tejano
Download Presentation

需完成之平行化工作

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 需完成之平行化工作 • 平行化domain decomposition之方案確定。 • timcom 之preprocessor with f95 and dynamic allocated memory (inmets, indata, bounds) • timcom main code with f95 and dynamic allocated memory • EVP solver with f95 and dynamic allocated memory • Subroutines a2o/o2a with cross cpu core data exchange.或 • Timcom 改寫為每cpu core可同時處理南北半球。 • Netcdf input and output。

  2. Domain Decomposition方案 • 採timcom之架構。需要修改a2o/o2a等subroutine,使其可以跨node 來交換timcom 及echam 之資料。優:y 方向ghost zone之傳輸量為2)之一半。缺:需額外跨core交換(llon*llat-2*ng*llat)之資料。 • 採echam之架構。如採此,則每一個cpu core皆需同時計算南北半球之海洋domain,這在timcom需修改部份code。優:同樣cpu數下,會比1)快,因跨core之交換資料llon>2之條件較少。

  3. glat nproca (4 ) EQ nprocb (3) glon

  4. Y YVDEG(J0), YV(J0) YDEG(J0), Y(J0) DY(J0) DX(J0) Y1DEG, YVDEG(J1) DYV(J1) YVDEG(3) DYV(3) YDEG(3) DY(3) DX(3) YVDEG(2), YV(2) YVDEG(2), Y(2) DY(2) DX(2) Y0DEG, YV(1) YVDEG(1), YDEG(1) DYV(2) X ng X1DEG X0DEG

  5. Parallel Consideration • 目前這版本許多設定還有問題,因此一下子就會crash。但試一下是好的。 • 另如有可能,建議將目前mo_ocean中與原始timcom之同樣功能之subroutine 併入standalone 平行化之timcom版,以方便測試,看是否正常,尤其是希望可以發展中ng>=2之版之f90, dynamic allocated memory之單純海洋模式。這些測試有助於我們之後再併入echam。

  6. Information for whole ECHAM domain • nlon:number of longitudes of the global domain • nlat : number of latitudes of the global domain • nlev: number of levels of the global domain Information valid for all processes of a model instance • nproca:number of processors for the dimension counts longitudes • nprocb: number of processors for the dimension counts latitudes • d_nprocs :number of processors used in the model domain nproca × nprocb • spe, epe :Index number of first and last processor which handles this model domain • mapmesh(ib,ia) :array mapping from a logical 2-d mesh to the processor index numbers within the decomposition table global decomposition. ib=1, nprocb ; ia=1, nproca

  7. General local information • pe :processor identifier. This number is used in the mpi send and receive routines • set_b :index of processor in the direction of longitudes. This number determines the location within the array mapmesh. processors with ascending numbers handle subdomains with increasing longitudes. • set_a :index of processor in the direction of latitudes. This number determines the location within the array mapmesh. processors with ascending numbers handle subdomains with decreasing values of absolute latitudes.

  8. Grid space decomposition • nglat, nglon:mumber of longitudes and latitudes in grid space handle by this processor. • nglpx:number of longitudes allocated. • glats(1: 2), glate(1: 2) :start and end values of global latitude indices. • glons(1: 2), glone(1: 2) :start and end values of global longitude indices. • glat (1: nglat) :global latitude index. • glon(1: nglon) :offset to global longitude index.

  9. echam memory_g3b 等變數(如sitwt, sitwu,皆是local之變數。並不是基於一個main scatter 出去然後collect 各processors 的 。而是各個node分別計算而來。只是echam其排列方式仍與timecom 不同。

  10. The Lin-Rood Finite Volume (FV) Dynamical Core:Tutorial Christiane Jablonowski National Center for Atmospheric ResearchBoulder, Colorado NCAR Tutorial, May / 31/ 2005

  11. Topics that we discuss today • The Lin-Rood Finite Volume (FV) dynamical core • History: where, when, who, … • Equations & some insights into the numerics • Algorithm and code design • The grid • Horizontal resolution • Grid staggering: the C-D grid concept • Vertical grid and remapping technique • Practical advice when running the FV dycore • Namelist and netcdf variables variables (input & output) • Dynamics - physics coupling • Hybrid parallelization concept • Distributed-shared memory parallelization approach: MPI and OpenMP • Everything you would like to know

  12. Who, when, where, … • FV transport algorithm developed by S.-J. Lin and Ricky Rood (NASA GSFC) in 1996 • 2D Shallow water model in 1997 • 3D FV dynamical core around 1998/1999 • Until 2000: FV dycore mainly used in data assimilation system at NASA GSFC • Also: transport scheme in ‘Impact’, offline tracer transport • In 2000: FV dycore was added to NCAR’s CCM3.10 (now CAM3) • Today (2005): The FV dycore • might become the default in CAM3 • Is used in WACCAM • Is used in the climate model at GFDL

  13. Dynamical cores of General Circulation Models Dynamics Physics FV: No explicit diffusion (besides divergence damping)

  14. The NASA/NCAR finite volume dynamical core • 3D hydrostatic dynamical core for climate and weather prediction: • 2D horizontal equations are very similar to the shallow water equations • 3rd dimension in the vertical direction is a floating Lagrangian coordinate: pure 2D transport with vertical remapping steps • Numerics: Finite volume approach • conservative and monotonic 2D transport scheme • upwind-biased orthogonal 1D fluxes, operator splitting in 2D • van Leer second order scheme for time-averaged numerical fluxes • PPM third order scheme (piecewise parabolic method)for prognostic variables • Staggered grid (Arakawa D-grid for prognostic variables)

  15. The 3D Lin-Rood Finite-Volume Dynamical Core Momentum equation in vector-invariant form Continuity equation Pressure gradient term in finite volume form Thermodynamic equation, also for tracers (replace ): The prognostics variables are: p: pressure thickness, =Tp-: scaled potential temperature

  16. Finite volume principle Continuity equation in flux form: Integrate over one time step t and the 2D finite volume  with area A: Integrate and rearrange: Time-averagednumerical flux Spatially-averagedpressure thickness

  17. Finite volume principle Apply the Gauss divergence theorem: unit normal vector Discretize:

  18. Orthogonal fluxes across cell interfaces Flux form ensures mass conservation G i,j+1/2 F i-1/2,j F i+1/2,j (i,j) G i,j-1/2 Upwind-biased: Wind direction F: fluxes in x directionG: fluxes in y direction

  19. Quasi semi-Lagrange approach in x direction CFLy = v * t/y < 1 required G i,j+1/2 F i-5/2,j F i+1/2,j (i,j) G i,j-1/2 CFLx = u * t/y > 1 possible: implemented as an integer shift and fractional flux calculation

  20. Numerical fluxes & subgrid distributions • 1st order upwind • constant subgrid distribution • 2nd order van Leer • linear subgrid distribution • 3rd order PPM (piecewise parabolic method) • parabolic subgrid distribution • ‘Monotonocity’ versus ‘positive definite’ constraints • Numerical diffusion Explicit time stepping scheme: Requires short time steps that are stable for the fastest waves (e.g. gravity waves) CGD web page for CAM3: http://www.ccsm.ucar.edu/models/atm-cam/docs/description/

  21. Subgrid distributions:constant (1st order) x1 x2 x3 x4 u

  22. Subgrid distributions:piecewise linear (2nd order) van Leer x1 x2 x3 x4 u See details in van Leer 1977

  23. Subgrid distributions:piecewise parabolic (3rd order) PPM x1 x2 x3 x4 u See details in Carpenter et al. 1990 and Colella and Woodward 1984

  24. Monotonicity constraint • Prevents over- and undershoots • Adds diffusion not allowed van Leer Monotonicity constraint resultsin discontinuities x1 x2 x3 x4 u See details of the monotinity constraint in van Leer 1977

  25. Simplified flow chart subcycled 1/2 t only: compute C-grid time-mean winds stepon dynpkg cd_core c_sw d_p_coupling trac2d physpkg te_map full t: update all D-grid variables d_sw p_d_coupling Vertical remapping

  26. Grid staggerings (after Arakawa) B grid u v u v A grid u v u v u v C grid v u D grid u v v u Scalars: v u

  27. Regular latitude - longitude grid • Converging grid lines at the poles decrease the physical spacing x • Digital and Fourier filters remove unstable waves at high latitudes • Pole points are mass-points

  28. Typical horizontal resolutions • Time step is the ‘physics’ time step: • Dynamics are subcyled using the time step t/nsplit • ‘nsplit’ is typically 8 or 10 • CAM3: check (dtime=1800s due to physics ?) • WACCAM: check (nsplit = 4, dtime=1800s for 2ox2.5o ?) Defaults:

  29. Idealized baroclinic wave test case The coarse resolution does not capture the evolution of the baroclinic wave Jablonowski and Williamson 2005

  30. Idealized baroclinic wave test case Finer resolution: Clear intensification of the baroclinic wave

  31. Idealized baroclinic wave test case Finer resolution: Clear intensification of the baroclinic wave, it starts to converge

  32. Idealized baroclinic wave test case Baroclinic wave pattern converges

  33. Idealized baroclinic wave test case:Convergence of the FV dynamics Global L2 error norms of ps Solution starts converging at 1deg Shaded region indicates the uncertainty of the reference solution

  34. Floating Lagrangian vertical coordinate • 2D transport calculations with moving finite volumes (Lin 2004) • Layers are material surfaces, no vertical advection • Periodic re-mapping of the Lagrangian layers onto reference grid • WACCAM: 66 vertical levels with model top around 130km • CAM3: 26 levels with model top around 3hPa (40 km) • http://www.ccsm.ucar.edu/models/atm-cam/docs/description/

  35. Physics - Dynamics coupling • Prognostic data are vertically remapped (in cd_core) before dp_coupling is called (in dynpkg) • Vertical remapping routine computes the vertical velocity  and the surface pressure ps • d_p_coupling and p_d_coupling (module dp_coupling) are the interfaces to the CAM3/WACCAM physics package • Copy / interpolate the data from the ‘dynamics’ data structure to the ‘physics’ data structure (chunks), A-grid • Time - split physics coupling: • instantaneous updates of the A-grid variables • the order of the physics parameterizations matters • physics tendencies for u & v updates on the D grid are collected

  36. Practical tips Namelist variables: • What do IORD, JORD, KORD mean? • IORD and JORD at the model top are different (see cd_core.F90) • Relationship between • dtime • nsplit (what happens if you don’t select nsplit or nsplit =0, default is computed in the routine d_split in dynamics_var.F90) • time interval for the physics & vertical remapping step Input / Output: • Initial conditions: staggered wind components US and VS required (D-grid) • Wind at the poles not predicted but derived User’s Guide: http://www.ccsm.ucar.edu/models/atm-cam/docs/usersguide/

  37. Practical tips Namelist variables: • IORD, JORD, KORD determine the numerical scheme • IORD: scheme for flux calculations in x direction • JORD: scheme for flux calculations in y direction • KORD: scheme for the vertical remapping step • Available options: • - 2: linear subgrid, van-Leer, unconstrained • 1: constant subgrid, 1st order • 2: linear subgrid, van Leer, monotonicity constraint (van Leer 1977) • 3: parabolic subgrid, PPM, monotonic (Colella and Woodward 1984) • 4: parabolic subgrid, PPM, monotonic (Lin and Rood 1996, see FFSL3) • 5: parabolic subgrid, PPM, positive definite constraint • 6: parabolic subgrid, PPM, quasi-monotone constraint • Defaults: 4 (PPM) on the D grid (d_sw), -2 on the C grid (c_sw)

  38. ‘Hybrid’ Computer Architecture • SMP: symmetric multi-processor • Hybrid parallelization technique possible: • Shared memory (OpenMP) within a node • Distributed memory approach (MPI) across nodes Example: NCAR’s Bluesky (IBM) with 8-way and 32-way nodes

  39. Schematic parallelization technique 1D Distributed memory parallelization (MPI) across the latitudes: Proc. NP 1 2 Eq. 3 4 SP 0 Longitudes 340

  40. Schematic parallelization technique Each MPI domain contains ‘ghost cells’ (halo regions):copies of the neighboring data that belong to different processors NP Proc. 2 Eq. 3 ghostcells for PPM SP 0 Longitudes 340

  41. Schematic parallelization technique Shared memory parallelization (in CAM3 most often) in the vertical direction via OpenMP compiler directives: Typical loop: do k = 1, plev … enddo Can often be parallelized with OpenMP (check dependencies): !$OMP PARALLEL DO … do k = 1, plev … enddo

  42. Schematic parallelization technique Shared memory parallelization (in CAM3 most often) in the vertical direction via OpenMP compiler directives: k CPU e.g.: assume 4 parallel ‘threads’ and a 4-way SMP node (4 CPUs) !$OMP PARALLEL DO … do k = 1, plev … enddo 1 1 4 5 2 8 3 4 plev

  43. Thank you !Any questions ??? • Tracer transport ? • Fortran code • …

  44. References • Carpenter, R., L., K. K. Droegemeier, P. W. Woodward and C. E. Hanem 1990: Application of the Piecewise Parabolic Method (PPM) to Meteorological Modeling. Mon. Wea. Rev., 118, 586-612 • Colella, P., and P. R. Woodward, 1984: The piecewise parabolic method (PPM) for gas-dynamical simulations. J. Comput. Phys., 54,174-201 • Jablonowski, C. and D. L. Williamson, 2005: A baroclinic instability test case for atmospheric model dynamical cores. Submitted to Mon. Wea. Rev. • Lin, S.-J., and R. B. Rood, 1996: Multidimensional Flux-Form Semi-Lagrangian Transport Schemes. Mon. Wea. Rev., 124, 2046-2070 • Lin, S.-J., and R. B. Rood, 1997: An explicit flux-form semi-Lagrangian shallow water model on the sphere. Quart. J. Roy. Meteor. Soc., 123, 2477-2498 • Lin, S.-J., 1997: A finite volume integration method for computing pressure gradient forces in general vertical coordinates. Quart. J. Roy. Meteor. Soc., 123, 1749-1762 • Lin, S.-J., 2004: A ‘Vertically Lagrangian’ Finite-Volume Dynamical Core for Global Models. Mon. Wea. Rev., 132, 2293-2307 • van Leer, B., 1977: Towards the ultimate conservative difference scheme. IV. A new approach to numerical convection. J. Comput. Phys., 23. 276-299

More Related