410 likes | 618 Views
Running CCSM. Tony Craig CCSM Software Engineering Group ccsm@ucar.edu. Outline. General review of CCSM Setting up and running a simple case Datasets Production Modifying source code Errors Tools Performance. Review of CCSM. Five components / Ten models
E N D
Running CCSM Tony Craig CCSM Software Engineering Group ccsm@ucar.edu
Outline • General review of CCSM • Setting up and running a simple case • Datasets • Production • Modifying source code • Errors • Tools • Performance
Review of CCSM • Five components / Ten models • Atmosphere(3) : atm, datm, latm • Ocean(2) : ocn, docn • Land(2) : lnd, dlnd • Ice(2+) : ice, ice (prescribed mode), ice (mixed layer ocean mode), dice • Coupler(1) : cpl • Communication via MPI between components and coupler only • Each component runs on multiple processors via MPI, OpenMP, MPI/OpenMP
Component parallelization • atm : MPI, OpenMP, or MPI/OpenMP • lnd : MPI, OpenMP, or MPI/OpenMP • Ice : MPI only • ocn : MPI only • cpl : OpenMP only • The data models, datm, docn, dice, dlnd, and latm : serial only, 1 processor
Configurations • A = datm, dlnd, docn, dice, cpl • B = atm, lnd, ocn, ice, cpl • C = datm, dlnd, ocn, dice, cpl • D = datm, dlnd, docn, ice, cpl • F = atm, lnd, docn, ice (prescribed mode), cpl • G = latm, dlnd, ocn, ice, cpl • H = atm, dlnd, docn, dice, cpl • I = datm, lnd, docn, dice, cpl • K = atm, lnd, docn, dice, cpl • M = latm, dlnd, docn, ice (ml ocn mode), cpl
Resolutions • atm/lnd/datm/dlnd = T42, T31 • ocn/ice/docn/dice = gx1v3, gx3, gx3v4 • latm = T62 • Scientifically validated combinations • B, T42_gx1v3 = b20.007 control run (test.a1 case) • B, T31_gx3v4 = paleo control run (test.a2 case)
* = supported (subject to change) = b20.007 control = paleo control * * “Available” configurations
Platforms • IBM • SGI • Compaq*
Review of scripts • Main script (test.a1.run) • Sets primary ccsm environment variables • Calls $model.setup.csh • Gets input datasets • Builds components • Runs model • Archives • Harvests
Setting up a simple case • Use the GUI !! • The GUI modifies the scripts and creates a new case for you • Input $CASE, $CSMROOT, $CSMDATA, $EXEROOT • Input resolution • Input configuration (A-M) • Sets processor layout based on configuration (first guess) • Sets some batch environment variables • Works well in the NCAR environment, other sites require post script-generation tuning
Setting up a simple case, without GUI • Create new case directory under scripts, copy over test.a1 files • Rename file test.a1.run to $CASE.run • Edit $CASE, $CSMROOT, $CSMDATA, $EXEROOT, $ARCROOT • Edit batch environment parameters • Edit $GRID • Edit $SETUPS • Edit $NTASKS, $NTHRDS
$NTASKS, $NTHRDS, batch • $NTASKS are the total number of MPI tasks for each component • $NTHRDS are the number of OpenMP threads per MPI task • $NTASKS*$NTHRDS = total number of processors for each component • Tuning required to get optimal load balance • Batch parameters should match processors used, consistency important, task_geometry (loadleveler) is very powerful
Component parallelization • atm : MPI, OpenMP, or MPI/OpenMP • lnd : MPI, OpenMP, or MPI/OpenMP • ice : MPI only, NTHRDS=1 • ocn : MPI only, NTHRDS=1 • cpl : OpenMP only, NTASKS=1 • The data models, datm, docn, dice, dlnd, and latm : serial only, 1 processor, NTASKS=1, NTHRDS=1
Main script configuration summary • B case MODELS ( atm lnd ocn ice cpl) SETUPS ( atm lnd ocn ice cpl) NTASKS ( 8 2 40 8 1) NTHRDS ( 4 4 1 1 4) • datm/dlnd/ocn/ice case MODELS ( atm lnd ocn ice cpl) SETUPS ( datm dlnd ocn ice cpl) NTASKS ( 1 1 64 16 1) NTHRDS ( 1 1 1 1 4)
$RUNTYPE • Startup - initial startup of model using arbitrary initialization • set $CASE, $BASEDATE • Continue - continuation of case, bit-for-bit guaranteed, uses model restart files • set $CASE • Branch - start new case as a bit-for-bit continuation of another case, uses model restart files, requires continuous date • set $CASE, $REFCASE, $REFDATE • Hybrid - start new case, not bit-for-bit continuation, uses model initial files in atm and land, can change starting date • set $CASE,$BASEDATE,$REFCASE,$REFDATE
Coupler namelist • Stop_option: ndays, nmonths, newmonth, halfyear, newyear, newdecade • Stop_n : integer (ndays, nmonths) • Rest_freq : ndays, monthly, quarterly, halfyear, yearly • Rest_n : integer (ndays) • Diag_freq : daily, weekly, biweekly, monthly, quarterly, yearly, ndays • Diag_n : integer (ndays) • info_bcheck : integer
Data Sets • Types • Grid files, binary • Namelist input, ascii • Initial datasets, binary/netcdf • Restart datasets, binary • History datasets, netcdf • Log files, ascii • inputdata directory • This is usually pointed to by $CSMDATA
scripts/$CASE $CSMDATA = inputdata $EXEROOT Setup scripts $ARCROOT/restart Mass Store Data Flow, Input • Everything is copied to $EXEROOT • Tools and scripts attempt to automate most of the “get input files” • Main script variables include $CSMDATA, $LFSINP, $LMSINP, $MACINP, $RFSINP, $RMSINP
Data Flow, Output • Output files are moved out of $EXEROOT • Harvesting is a separate process • Writing of restart files coordinated by the coupler • Writing of history files is not coordinated between components, monthly average is default • Main script variables include $LMSOUT, $MACOUT, $RFSOUT Scripts $EXEROOT Mass Store archiving $ARCROOT harvesting
Log Files • Each component produces a log file, $model.log.$LID • $LID is a system date stamp • Date stamps are the same on all log files for a run • Log files are written into the $EXEROOT/$model directories during execution • Log files are copied to $SCRIPTS/logs at the end of a run • There are separate stdout and stderr that sometimes contain output information
Archiving, ccsm_archive • Means moving model output to a separate area on a local disk, ccsm_archive • Local disk area is set by $ARCROOT in the main script • Benefits • Allows separation of running and harvesting • Mass storage availability does not prevent continued execution of the model • Allows users to run in volatile temporary space • Supports simple harvesting in a clustered machine environment (like nirvana)
Harvesting, $CASE.har • Means copying model output to the local mass store • Separate script in scripts/$CASE, $CASE.har • Typically submitted in batch, can also be run interactively • Submitted by main script after model run, off by default • Sources ccsm_joe for important environment variables • Harvests all files in $ARCROOT/{atm,lnd,ocn,ice,cpl} • Verifies accurate copy on mass store before removing • Can scp files to remote machines
Exact Restart • CCSM can stop and restart exactly • The coupler controls the frequency of restart file writes • Restart files guarantee bit-for-bit continuity at a checkpoint boundary • rpointer files are updated in the scripts/$CASE directory after each run
Restart file management (1) • ccsm_archive • In scripts/$CASE • Called from main script after model run is complete, commented out by default • $ARCROOT/restart contains the latest full set of restart files • ccsm_archive copies full set of restart datasets into $ARCROOT/restart after each run • ccsm_archive then tars up that restart set into the $ARCROOT/restart.tars directory • These tar files can be large, regular clean up required
Restart file management (2) • ccsm_getrestart • In scripts/tools • Called from main script before model run starts, commented out by default • Copies the latest set of restart files from $ARCROOT/restart to the appropriate directories • To “backup” model run to previous model date • Assumes both ccsm_archive and ccsm_getrestart have been active in the main script • Delete all files in $ARCROOT/restart • Untar an $ARCROOOT/restart.tars file into $ARCROOT/restart • Resubmit
Auto-Resubmit • RESUBMIT file in scripts/$CASE directory • contains a single integer • If the integer is >0, main script resubmits itself and decrements the integer • Runaway jobs • FIRST! set value in RESUBMIT file to 0 • Attempt to kill running jobs
Production • Modify coupler namelist in cpl.setup.csh, set run length and restart frequency, turn down diagnostic frequency, set info_bcheck to 0. • Run a startup, hybrid, or branch case $RUNTYPE • Transition to continue $RUNTYPE • Turn on archiving, harvesting, and ccsm_getrestart • Edit RESUBMIT file to initiate auto-resubmission
Monitoring a run • Monitor the batch jobs using llq, bjobs, qstat • Verify that runs complete successfully, check for timing information at the end of a log file • Tail -f $EXEROOT/cpl/cpl.log* • If runs are not succeeding, • tail each log file • grep for ENDRUN in atm and lnd log files • Check stdout and stderr files for component messages or system messages • Look for core files in $EXEROOT/$model • Look for zero length files in $EXEROOT/$model • Check email
Modifying source code • Modifying files in the ccsm models directory is not recommended • Create directories under scripts/$CASE • src.atm, src.lnd, src.ocn, src.ice, src.cpl • Copy subset of model source code to these directories and modify it • Has highest priority with respect to build • Benefits include • Release source code remains unmodified and available • Allows implementation of case dependent code modifications
Multiple Machine Support • Should run on blackforest, babyblue, and ute “out of the box” • “Other” machines include seaborg, nirvana, eagle, falcon, cheetah • Supported platforms are indicated in $OS, $SITE, $MACH, $ARCH environment variables in the main script • See also scripts/tools/test.a1.mods.$MACH for suggested changes to test.a1.run for “other” machines.
Running on a “New” Machine • Main script • Set batch queue commands • Add new $OS, $SITE, $MACH, $ARCH options • Set standard CCSM path names, $CSMROOT, … • Harvester submission issues • Set data movement variables, $LMSINP, … • Harvester script • May require modification • Tools • May need to modify ccsm_msread, ccsm_mswrite • Build • Modify models/bld/Macros.$OS file
ccsm_joe • Created by main script • Updated every time the main script runs • Case dependent • Records important ccsm environment variables • Can be “sourced” by other scripts to inherit ccsm environment variables
Interactive/Batch Issues • Can run main script interactively • Typically used to build and pre-stage initial data • Uncomment “exit” command in main script to stop the script before script starts ccsm execution • Batch environment highly site dependent • NQS • Loadleveler • LSF • PBS
Common Errors (1) • Model won’t build • Try rebuilding clean • Remove all obj directories, these are $OBJROOT/model/obj which is normally equivalent to $EXEROOT/model/obj • When rebuilding, make sure $SETBLD is true in main script • Model won’t continue due to restart problem • Determine cause of problem; quota, hardware, script, zero length files, rpointer problems • Fix if possible • Back up to latest “good” restart dataset • Rerun
Common Errors (2) • Ice model stops due to mp transport error • Double ndte in ice.setup.csh ice model namelist • Back up to latest “good” restart dataset • Run past previous stop date • Reset ndte value • Ocean model non-convergence • Add about 10% to the number of model timesteps/hour in ocn.setup.csh, DT_COUNT • Back up to latest “good” restart dataset • Run past previous stop date • Reset DT_COUNT • Non-convergence on first timestep is special case
Tools • Under scripts/tools • ccsm_getfile : hierarchical search for file • ccsm_getinput : hierarchical search for input file • ccsm_msread : copies a file from local mass store • ccsm_mswrite : copies a file to local mass store • ccsm_checkenvs : echo ccsm environment variables, used to created ccsm_joe • ccsm-getrestart : copies restart files from $ARCROOT/restart to appropriate $EXEROOT and scripts/$CASE directories
Performance • This is complicated! • Issues • Performance of components and system as a function of resolution and configuration • Scalability of individual components, scaling efficiency of individual components • Task/Thread counts • Components sharing nodes, overloading nodes with multiple components, overloading threads, overloading tasks • Load balance of coupled system
CCSM Load Balancing 40 ocean 32 atm 16 ice 12 land 04 cpl 104 total processors 53.2 8.6 40.4 6.2 15.0 9.4 3.0 10.0 10.0 5 3 2 55 Timings in seconds per day
Component/Hardware layout • Machine, set of nodes • Nodes, group of processors that share memory • Processors, individual computing elements • General rules • Do not oversubscribe processors, place only 1 MPI task or 1 thread on each processor • Minimize the number of nodes used for a given component and processor requirement • Multiple components can share a node as long as there is no oversubscription of processors • Test several decompositions, layouts, task/thread combinations to try to optimize performance
Summary • CCSM is a complicated multi-executable climate model, expect there to be “spin-up” time • CCSM is a scientific research code • There are many possible components, configurations, platforms, and resolutions; we are unable to test everything • Users are responsible for validating their science • NCAR can help with software/configuration problems, ccsm@ucar.edu • Please report bugs, fixes, improvements, and ports to new hardware, so we can incorporate those changes! ccsm@ucar.edu