1 / 18

RAMS Parallel Processing Techniques

RAMS Parallel Processing Techniques. RAMS Parallel Design Considerations. Concentrating on larger problems (only 3-dimensional) No compromises in physics and numerics Target distributed memory, MIMD architectures workstation cluster first platform

naomi
Download Presentation

RAMS Parallel Processing Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RAMS Parallel Processing Techniques

  2. RAMS Parallel Design Considerations • Concentrating on larger problems (only 3-dimensional) • No compromises in physics and numerics • Target distributed memory, MIMD architectures • workstation cluster first platform • Single code version for single processor and parallel platforms • No performance degradation on single processor platforms • Dynamic load balancing needed

  3. RAMS Parallel Components and Structure • Main structure • Memory structure • Variable tables • Domain decomposition • Communication procedures • Concurrent computation/communication • Bookkeeping arrays • File output • Nested grid considerations • Dynamic load balancing

  4. RAMS Parallel Structure • For parallel execution, serial (single-processor) code divided into two types: • Master process • Initialization • All input/output functions • Node (“compute”) processes • All computation

  5. RAMS Memory Structure • Many options / array space / nests with variable numbers of grid points • Dynamically-allocate “A” array • “1-D” array with all memory • “Pointers” to beginning of each field (integer indices in COMMON) • Computational routines dynamically-dimension arrays • Each node/master process allocates its own memory

  6. RAMS Variable Tables • Data file defines characteristics of each model array • Existence (whether on node or master processes) • Output • Communication

  7. # Velocity, perturbation Exner function UP : 1:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 UC : 2:3: 0:110000:vard:hist:mpti:mpt3 VP : 3:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 VC : 4:3: 0:110000:vard:hist:mpti:mpt3 WP : 5:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 WC : 6:3: 0:110000:vard:hist:mpti:mpt3 PP : 7:3: 0:110000:vard:hist:anal:lite:mpti:mpt3:mpt2 PC : 8:3: 0:110000:vard:hist:mpti:mpt3 RAMS VTABLE File

  8. Domain Decomposition Nodes exchange “overlap” rows (halo regions) with adjacent nodes

  9. Domain decomposition • Each column assigned a “work” factor • Factor based on boundary, physics, machine performance, etc. • Domain horizontally decomposed across processors • “2-D” decomposition (3 or more nodes)

  10. Timestep flow - single grid - serial Compute long timestep Start 1 2 3 End Small or acoustic timesteps Start

  11. Timestep flow - single grid - parallel Node 1: Start 1 1 1 End Compute Compute Node 2: Start 1 1 1 End Compute Compute

  12. Concurrent communication/computation • Overlap rows sent first thing in timestep flow. • Model processes that can be computed without overlap rows are • done first. • Sub-domain interior is computed for some terms • Messages are received • Sub-domain boundary terms are computed

  13. Bookkeeping arrays Keep track of communication “paths” source node / destination node ipaths(info,itype,ngrids,nproc,nproc) • number of processor • message bounds • machine id • grid number • “type” of message iget_paths(itype,ngrids,nproc,nproc)

  14. Communication “types” • Initialization / re-decomposition • Master to node; Full sub-domains; Most variables • Long timestep overlap region • Node to node; 1 row; Prognostic variables • Small timestep overlap region (2 events) • Node to node; 1 row; Selected variables • File output • Node to master; Full sub-domains; Prognostic variables • All node to node types package variables in a single message before sending.

  15. File output • Nodes transfer necessary data to master process • Transfer done with message passing • No local disk space or NFS needed for file output • Nodes can continue computing • Data available on master for other activities • Parallel I/O not needed

  16. Nested grid considerations • all grids decomposed independently • 2-way interactive communication • interpolation of nested grid boundaries performed on fine grid • nodes, coarse grid information sent to fine grid • feedback (averaging) of fine grid information performed on fine • grid node, averaged values sent to coarse grid • minimizes data needed to be communicated

  17. Dynamic load balancing • Regions of domain can require additional physics as model • progresses (microphysics, radiation) • Nodes send timing information back to master at end of timestep • At opportune times (coinciding with file output times), • master process will: • re-compute “work” factor • re-decomposes the domain • re-initializes the nodes

  18. Performance • Performance varies, dependent on architecture, • communication hardware, model configuration • Speedups and efficiencies • Results • 60-110% efficiency Speed up: Time to run serial code S = Time to run parallel code Efficiency: E = S/ # processors

More Related