550 likes | 723 Views
Overview of Nesting in the NMM-B. Tom Black. 18 February 2014. ● Nests in operational NEMS/NAM. ● MPI task usag e. - 1-way exchange . - 2-way exchange . - The communicators . ● The grid. ● Motion. ● 2-way exchange. ● User specification of nest-related variables.
E N D
Overview of Nesting in the NMM-B Tom Black 18 February 2014
● Nests in operational NEMS/NAM ● MPI task usage - 1-way exchange - 2-way exchange - The communicators ● The grid ● Motion ● 2-way exchange ● User specification of nest-related variables ● Sequence of execution 2
General Characteristics of NMM-B Nests ● Parent-oriented ● Static / moving 1-way / 2-way interactive ● ● Multiple nests run simultaneously ● Telescoping domains ● Bit restartable ( static / moving / 1-way / 2-way* ) 3
NEMS Structure MAIN NEMS All boxes represent ESMF components. Earth Ensemble Mediator EARTH(1:NM) Atm-Ocn Mediator Atm Ice Ocean parents and children GSM FIM NMM Domain (1:ND) Wrt Phy Wrt Dyn Chem Wrt Phy Dyn Solver 4
Parent runs at 12 km to 84 hr Four static nests run to 60 hr 4 km CONUS nest (3-to-1) 6 km Alaska nest (2-to-1) 3 km HI & PR nests (4-to-1) Single relocatable 1.33km or 1.5km FireWeather grandchild run to 36hr (3-to-1 or 4-to-1) Current Operational NAM with 1-Way Static Nests 5
Task Usage for NMM-B 1-Way Nesting The user distributes available compute tasks among all the various domains and fine-tunes those assignments (along with those of quilt tasks) so that parents and their children proceed in the forecast at virtually the same rate as all domains integrate concurrently. This gives the user the ability to optimize the work load balance. 6
Relative Compute Resources used by NAM/Nests 12 km parent 10% 6 km Alaska nest 7%7% 4 km CONUS nest 57%57% 1.33 km CONUS FireWxnest 17% 3 km Hawaii nest 5% 3 km Puerto Rico nest 4% 7
1-Way Integration for Three Generations All generations integrate concurrently. Δt par Δt child Δt grandchild 8 Parent updates child BCs
NMM-B with 1-Way Nesting using 72 Compute Tasks generation #1 tasks 0-7 8 2 2 4 32 generation #2 tasks 8-47 generation #3 tasks 48-71 24 9
Two Key Timers cpl1_recv_tim: Child wait time to recv BC data Appears as ‘cplrecv= ‘ in stdout file cpl2_wait_tim: Parent wait time for BC send to finish Appears as ‘cpl wait = ‘ in stdout file If child wait time is large then child is too fast relative to parent. => Reduce child tasks, increase parent tasks. If parent wait time is large => parent is too fast relative to child. => Reduce parent tasks, increase child tasks. 10
2-Way Integration for Three Generations Only one generation can be active at a given time. Δt par Δt child Δt grandchild 11 Parent updates child BCs Child updates parent
Use 1-Way Task Assignment Strategy in 2-Way Nests? NO – Too many tasks can sit idle since domains are active in only one generation at a time. Therefore use a different approach based on the generations of domains. 12
NMM-B with 1-Way Nesting using 72 Compute Tasks generation #1 tasks 0-7 8 2 2 4 32 generation #2 tasks 8-47 generation #3 tasks 48-71 24 Only 40 of 72 tasks working in the busiest generation if using this method for 2-way. 13
‣ ‣ ‣ Generations must wait on each other in 2-way mode. Then reassign only as many compute tasks to domains in each remaining generation as is beneficial in minimizing the clocktimes of those generations by avoiding too small subdomains with too costly halo exchanges. All domains cannot execute concurrently so maximize the amount of work that can be done at any given time by assigning ALL compute tasks to the most expensive generation and distributing them among its domains for optimal efficiency. Basic Strategy for Task Use by Generations 14
‣ A compute task can be in more than one generation but cannot be on more than one domain per generation. ‣ ALL compute tasks are assigned to the most expensive generation. Rules for ‘Generational’ Task Usage ‣ Each quilt task must still be uniquely assigned to a single domain to retain asynchronous writing of output. ‣ ‣ All domains in each generation execute concurrently. Generations execute sequentially. The user is now able to optimize speed in 2-way nesting while never imposing large imbalances. Some tasks might be idle in some generations but all generations are running as fast as possible. 15
NMM-B with 2-Way Nesting using 72 Compute Tasks ‘Generational’ task usage generation #1 tasks 0-11 12 4 4 8 56 generation #2 tasks 0-71 generation #3 tasks 12-53 42 All 72 of 72 tasks working in the busiest generation. 16
Preliminary Estimate of 1-Way Compute Task Assignments There are N compute tasks available. There are 3 generations with 1 domain, 2 domains, and 2 domains, respectively. Domain #1: IM1 , JM1 DT1 => Work1 = IM1 x JM1 Domain #2: IM2 , JM2 DT2 => Work2 = IM2 x JM2 x ( DT1 / DT2 ) Domain #3: IM3 , JM3 DT3 => Work3 = IM3 x JM3 x ( DT1 / DT3 ) Domain #4: IM4 , JM4 DT4 => Work4 = IM4 x JM4 x ( DT1 / DT4 ) Domain #5: IM5 , JM5 DT5 => Work5 = IM5 x JM5 x ( DT1 / DT5 ) Total Work = TW = Work1 + Work2 + Work3 + Work4 + Work5 Domain #1 compute tasks: Work1 / TW x N Domain #2 compute tasks: Work2 / TW x N Domain #3 compute tasks: Work3 / TW x N Domain #4 compute tasks: Work4 / TW x N Domain #5 compute tasks: Work5 / TW x N 17
Preliminary Estimate of 2-Way Compute Task Assignments Same setup as the 1-way case. Assume 2nd generation is the most expensive. Total Work = TW2 = Work2 + Work3 gen #2: Distribute tasks in 2nd generation as done for all 1-way domains previously. Domain #2 compute tasks: Work2 / TW2 x N Domain #3 compute tasks: Work3 / TW2 x N Assign as many of the N tasks to generations 1 and 3 as possible without slowing down the run. Domain #1 compute tasks: <= N gen #1: Total Work = TW3 = Work4 + Work5 gen #3: Domain #4 compute tasks: <= Work4 / TW3 x N Domain #5 compute tasks: <= Work5 / TW3 x N 18
Example of 2-way Task Assignments ‣ ‣ You have 128 available tasks. Five domains; 3 generations; 3rd is most expensive. - 112 compute - 116 write Compute Write gen #1 Dom #1 : 5x8 1x2 6x6 1x3 Dom #2 : gen #2 = 128 Dom #3 : 6x6 1x3 7x8 1x4 Dom #4 : gen #3 Dom #5 : 7x8 1x4 19 = 112 = 16
‣ ‣ ‣ MPI intercommunicators are very convenient for this. The lead tasks on both domains have rank 0. MPI sends/recvsuse simple target and sender task ranks. One-Way Communication Between a Parent and Child 20
Example of an Intercommunicator The global task ranks (unique task assignments to domains): Parent – 25, 26, 27 Child – 52, 53, 54, 55 The intercommunicator task ranks: Parent – 0, 1, 2 Child – 0, 1, 2, 3 21
‣ ‣ ‣ MPI intercommunicators cannot be used because parent and child may share some of the same tasks. MPI does not allow global task ranks to be repeated in intercommunicators. Parent/child task ranks may repeat but will lie in a single non-repeating sequence in the communicator. Therefore we use MPI intracommunicators. Parent and Child Communications w/ Generations 22
Example of an Intracommunicator The global task ranks (tasks can be in more than 1 generation): Parent – 3, 4, 5, 6 Child – 1, 2, 3, 4, 5, 6, 7 The intracommunicator task ranks (parent first): Union – 3, 4, 5, 6, 1, 2, 7 -> 0, 1, 2, 3, 4, 5, 6 Parent – 0, 1, 2, 3 Child – 4, 5, 0, 1, 2, 3, 6 More bookkeeping for the Init step. Variable sources/targets in MPI sends/recvs. 23
B-grid vs. E-grid B-grid is just a rotated E-grid v v v H H H v v v H H H v v v H H H E-grid v vv H HH v vv H HH v vv H HH B-grid B-grid dx and dy E-grid dx and dy 24
Parent-Oriented Nests Portion of Parent Domain Nest Task Subdomains ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ x x x x x x x x x x x x Parent Task Subdomains x x x x x x x x x x x x ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ x x x x x x x x x x x x ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ The southwest H point of the nest domain coincides with a parent H point. 25
Summary of Parent-Child Gridpoint Relationships ODD space ratio EVEN space ratio Hh hHh HhhhHh vv vvv hhhh hVhh vVvv vv hhhh vvv Hh hHh HhhhHh Child hpoints lie on parent H points. Child h points lie on parent H and V points. Child v points lie on parent V points. Child v points do not coincide with parent points. 26
Parent and Child H Gridpoints for 3:1 Ratio Nest point locations SW corner of nest ITE_PARENT_ON_CHILD=9 ITS_PARENT_ON_CHILD=-5 I=IDS=1 HhhhHhhhHhhhHhhhHhhhHh gap ITS_PARENT=1 I_PARENT_SW=3 ITE_PARENT=5 Last point on parent task 1st point on parent task Parent point locations 27
gap ITE_PARENT=5 ITE_PARENT=5 I_PARENT_SW=3 I_PARENT_SW=3 ITS_PARENT=1 ITS_PARENT=1 on H on H on V on H on V on V Parent and Nest V Gridpoints for 3:1 Ratio Nest point locations ITE_PARENT_ON_CHILD=9 ITS_PARENT_ON_CHILD=-5 I=IDS=1 HhhhHhhhHhhhHhhhHhhhHh v v v vVvvvVvvvVvvvVvvvVvvvVv v v v HhhhHhhhHhhhHhhhHhhhHh Parent point locations 28
NMM-B Moving Nests 1-way or 2-way interactive. ● ● Forecast can contain multiple nests. ● Telescoping domains. 29
Three Types of Data Motion Needed to Satisfy a Nest’s Shift Nest domain after shift Inter-Task Update Parent updates Nest domain before shift Intra-task Update Occupies the pre-move ‘footprint’ 30
Shift onto a Corner Nest domain before shift occupies the pre-move ‘footprint’ Intra-task update Nest domain after shift Parent updates 31
Simplest Parent Update Over the SW Corner SW corner of pre-move footprint Here one parent task updates the entire parent update region of this nest task subdomain. 32 Nest Task Subdomain
Four Parent Tasks Update Over the SW Corner 2nd parent task’s 2nd update region SW corner of pre-move footprint 3rd parent task’s update region 2nd parent task’s 1st update region 4th parent task’s update region 1st parent task’s update region 33 Nest Task Subdomain
‣ ‣ ‣ Intra-task updating is the simplest (a shift in memory). Inter-task updating is more complex. Updates from the parent are the most complicated. Child’s Bookkeeping for Relative Motion The child tasks determine which of their points are updated by each of the three processes. Child tasks determine which of their subdomain points inside the pre-move footprint will be updated by which other child tasks and vice versa. Child tasks determine which of their subdomain points outside of the pre-move footprint will be updated by which parent tasks. 34
Parent’s Bookkeeping for Relative Child Motion The parent tasks perform bookkeeping to determine which nest points are updated by the parent outside of the pre-move footprint. So due to the complexity involved both the parent and child tasks perform this bookkeeping from their own perspectives to serve as checks on each other as well as to eliminate some additional communication. 35
The Parent Stores Its Bookkeeping Results Child task subdomains and those points on them that are updated by a given parent task change with each shift of the nest. Use arrays of linked lists to deal with this continual change. Parent array of moving nest update specifications Element 1 Element 3 Element 2 Moving Child #1 Moving Child #3 Moving Child #2 Nest tasks to be updated Each link holds parent task update specifications for each relevant task of a moving child following a shift. 36
The Child Stores Its Bookkeeping Results There is no need for linked list arrays in storing the bookkeeping results from the child’s perspective since the number of parent tasks providing update data is always between 0 and 4. => Allocate a derived datatype array (1:4) and store appropriately. This assumes the geographical area of parent task subdomains is always larger than that of child task subdomains. 37
Surface Data ‣ Eight invariant surface fields from NPS cover the uppermost parent domain at each different resolution of all moving nests. ‣ Among these are topography, land/sea mask, soil type, vegetation type, and vegetation fraction. ‣ Each nest task with a parent update region reads the external files to update those variables rather than receiving them from the parent so as not to lose the higher resolution information. ‣ For sfc variables NOT among those eight: (a) Generate a search list of I,J increments from near to far. (b) If parent update sfc data is from a different surface type then the nest searches for its own nearest point with the same sfc type (e.g. soil T or SST). 38
2-Way Exchange As is done for motion both the child and the parent compute which parent tasks will receive which upscale data from which child tasks. This eliminates some communication and serves as a check. 39
2-Way Exchange - Child Is the child at the end of a parent timestep? (1) If so, determine which points on which parent tasks it will update. (2) (3) Loop through the appropriate parent tasks. - Loop through the specified 2-way variables. - Generate upscale values using the mean of child values within the stencil region. - Send upscale data for all variables to the given parent task. 40
Generate Upscale Values – Odd Space Ratio H-pt variables V-pt variables vvvv hhhh vvv hhh hhhh vvvv hHhh vVvv hhhh vvvv vvv hhh vvvv hhhh Average over these stencils 41
Generate Upscale Values – Even Space Ratio H-pt variables V-pt variables hhh hhh vv vv hVhh hHhh vv vv hhh hhh Average over these stencils 42
2-Way Exchange - Parent Determine which of its points are updated by which child tasks. Save each child task’s specs as a link in a linked list (since we do not know ahead of time how many child tasks will send data after each shift of moving nests). (1) (2) Loop through the appropriate child tasks. - Incorporate data if the current timestep does not immediately follow a restart output time (for bit identical restarts). - Recv data for all specified 2-way variables. - If the parent’s sfc elevation differs from the child’s then adjust the data using a spline interpolation. - Update the parent values applying the user-specified child weight from the configure file. 43
Specify Update Variables for Motion and 2-Way Exchange ● Use the nests.txt file which (like solver_state.txt) lists desired variables from the Solver internal state. KEY for moving vbls: H– mass pt V – velocity pt L– land sfc W – water sfc F – read external file in parent update region x– parent must update halo when child moves KEY for 2-way vbls: H– mass pt V – velocity pt 44
Example of ‘nests.txt’ specifications ### Moving 2-way ### 2-D Integer ‘ISLTYP’ F - ‘Soil type’ ### 2-D Real ‘FIS’ F - ‘Sfcgeopotential (m2 s-2)’ ‘CMC’ Lx - ‘Canopy moisture (m)’ ‘SST’ Wx- ‘Sea surface temperature (K)’ ### 3-D Real ‘T’ H H ‘Sensible temperature (K)’ ‘U’ VV ‘U component of wind (m s-1)’ ‘STC’ Lx - ‘Soil temperature (K)’ 45
High Level Order of Execution Timestepping loop in subroutine NMM_INTEGRATE Children recvBC updates from parents from the end of the current parent timestep. ► Parents recv upscale data from children from the end of the previous parent timestep. ► Domain integrates ► Parents send BC updates to children who are at the beginning of the current parent timestep. ► Children send upscale data to parents who recv it at the beginning of the next parent timestep. ► 46
NMM_RUN DO Loop over generations (a single iteration for 1-way interaction) DO Loop over all (1-way) or some (2-way) forecast timesteps CALL phase 1Parent-Child Coupler Run( check 2-way signals ) CALL phase 2Parent-Child Coupler Run( children recv BCs from parents ) CALL phase 3Parent-Child Coupler Run( parents recv upscale from children ) CALL phase 1Domain Run ( integrate the forecast one timestep ) CALL phase 4Parent-Child Coupler Run( parents send BCs to children ) CALL phase 5Parent-Child Coupler Run( children send upscale to parents ) CALL phase 2 Domain Run ( digital filter) Advance the Clock CALL phase 3 Domain Run ( write history/restart ) ENDDO Timestep loop ENDDO Generations loop 47
Example of erratic nest motions due to weak storm(s) interacting with complex terrain. 48
High Priority Development Items ● Finish the user selection of nest boundary variables. ● Construct capability for self-oriented (not parent-oriented) nests. 50