Running CCSM

Running CCSM Tony Craig CCSM Software Engineering Group ccsm@ucar.edu

Outline • General review of CCSM • Setting up and running a simple case • Datasets • Production • Modifying source code • Errors • Tools • Performance

Review of CCSM • Five components / Ten models • Atmosphere(3) : atm, datm, latm • Ocean(2) : ocn, docn • Land(2) : lnd, dlnd • Ice(2+) : ice, ice (prescribed mode), ice (mixed layer ocean mode), dice • Coupler(1) : cpl • Communication via MPI between components and coupler only • Each component runs on multiple processors via MPI, OpenMP, MPI/OpenMP

Component parallelization • atm : MPI, OpenMP, or MPI/OpenMP • lnd : MPI, OpenMP, or MPI/OpenMP • Ice : MPI only • ocn : MPI only • cpl : OpenMP only • The data models, datm, docn, dice, dlnd, and latm : serial only, 1 processor

Configurations • A = datm, dlnd, docn, dice, cpl • B = atm, lnd, ocn, ice, cpl • C = datm, dlnd, ocn, dice, cpl • D = datm, dlnd, docn, ice, cpl • F = atm, lnd, docn, ice (prescribed mode), cpl • G = latm, dlnd, ocn, ice, cpl • H = atm, dlnd, docn, dice, cpl • I = datm, lnd, docn, dice, cpl • K = atm, lnd, docn, dice, cpl • M = latm, dlnd, docn, ice (ml ocn mode), cpl

Resolutions • atm/lnd/datm/dlnd = T42, T31 • ocn/ice/docn/dice = gx1v3, gx3, gx3v4 • latm = T62 • Scientifically validated combinations • B, T42_gx1v3 = b20.007 control run (test.a1 case) • B, T31_gx3v4 = paleo control run (test.a2 case)

* = supported (subject to change) = b20.007 control = paleo control * * “Available” configurations

Platforms • IBM • SGI • Compaq*

Review of scripts • Main script (test.a1.run) • Sets primary ccsm environment variables • Calls $model.setup.csh • Gets input datasets • Builds components • Runs model • Archives • Harvests

Setting up a simple case • Use the GUI !! • The GUI modifies the scripts and creates a new case for you • Input $CASE, $CSMROOT, $CSMDATA, $EXEROOT • Input resolution • Input configuration (A-M) • Sets processor layout based on configuration (first guess) • Sets some batch environment variables • Works well in the NCAR environment, other sites require post script-generation tuning

Setting up a simple case, without GUI • Create new case directory under scripts, copy over test.a1 files • Rename file test.a1.run to $CASE.run • Edit $CASE, $CSMROOT, $CSMDATA, $EXEROOT, $ARCROOT • Edit batch environment parameters • Edit $GRID • Edit $SETUPS • Edit $NTASKS, $NTHRDS

$NTASKS, $NTHRDS, batch • $NTASKS are the total number of MPI tasks for each component • $NTHRDS are the number of OpenMP threads per MPI task • $NTASKS*$NTHRDS = total number of processors for each component • Tuning required to get optimal load balance • Batch parameters should match processors used, consistency important, task_geometry (loadleveler) is very powerful

Component parallelization • atm : MPI, OpenMP, or MPI/OpenMP • lnd : MPI, OpenMP, or MPI/OpenMP • ice : MPI only, NTHRDS=1 • ocn : MPI only, NTHRDS=1 • cpl : OpenMP only, NTASKS=1 • The data models, datm, docn, dice, dlnd, and latm : serial only, 1 processor, NTASKS=1, NTHRDS=1

Main script configuration summary • B case MODELS ( atm lnd ocn ice cpl) SETUPS ( atm lnd ocn ice cpl) NTASKS ( 8 2 40 8 1) NTHRDS ( 4 4 1 1 4) • datm/dlnd/ocn/ice case MODELS ( atm lnd ocn ice cpl) SETUPS ( datm dlnd ocn ice cpl) NTASKS ( 1 1 64 16 1) NTHRDS ( 1 1 1 1 4)

$RUNTYPE • Startup - initial startup of model using arbitrary initialization • set $CASE, $BASEDATE • Continue - continuation of case, bit-for-bit guaranteed, uses model restart files • set $CASE • Branch - start new case as a bit-for-bit continuation of another case, uses model restart files, requires continuous date • set $CASE, $REFCASE, $REFDATE • Hybrid - start new case, not bit-for-bit continuation, uses model initial files in atm and land, can change starting date • set $CASE,$BASEDATE,$REFCASE,$REFDATE

Coupler namelist • Stop_option: ndays, nmonths, newmonth, halfyear, newyear, newdecade • Stop_n : integer (ndays, nmonths) • Rest_freq : ndays, monthly, quarterly, halfyear, yearly • Rest_n : integer (ndays) • Diag_freq : daily, weekly, biweekly, monthly, quarterly, yearly, ndays • Diag_n : integer (ndays) • info_bcheck : integer

Data Sets • Types • Grid files, binary • Namelist input, ascii • Initial datasets, binary/netcdf • Restart datasets, binary • History datasets, netcdf • Log files, ascii • inputdata directory • This is usually pointed to by $CSMDATA

scripts/$CASE $CSMDATA = inputdata $EXEROOT Setup scripts $ARCROOT/restart Mass Store Data Flow, Input • Everything is copied to $EXEROOT • Tools and scripts attempt to automate most of the “get input files” • Main script variables include $CSMDATA, $LFSINP, $LMSINP, $MACINP, $RFSINP, $RMSINP

Data Flow, Output • Output files are moved out of $EXEROOT • Harvesting is a separate process • Writing of restart files coordinated by the coupler • Writing of history files is not coordinated between components, monthly average is default • Main script variables include $LMSOUT, $MACOUT, $RFSOUT Scripts $EXEROOT Mass Store archiving $ARCROOT harvesting

Log Files • Each component produces a log file, $model.log.$LID • $LID is a system date stamp • Date stamps are the same on all log files for a run • Log files are written into the $EXEROOT/$model directories during execution • Log files are copied to $SCRIPTS/logs at the end of a run • There are separate stdout and stderr that sometimes contain output information

Archiving, ccsm_archive • Means moving model output to a separate area on a local disk, ccsm_archive • Local disk area is set by $ARCROOT in the main script • Benefits • Allows separation of running and harvesting • Mass storage availability does not prevent continued execution of the model • Allows users to run in volatile temporary space • Supports simple harvesting in a clustered machine environment (like nirvana)

Harvesting, $CASE.har • Means copying model output to the local mass store • Separate script in scripts/$CASE, $CASE.har • Typically submitted in batch, can also be run interactively • Submitted by main script after model run, off by default • Sources ccsm_joe for important environment variables • Harvests all files in $ARCROOT/{atm,lnd,ocn,ice,cpl} • Verifies accurate copy on mass store before removing • Can scp files to remote machines

Exact Restart • CCSM can stop and restart exactly • The coupler controls the frequency of restart file writes • Restart files guarantee bit-for-bit continuity at a checkpoint boundary • rpointer files are updated in the scripts/$CASE directory after each run

Restart file management (1) • ccsm_archive • In scripts/$CASE • Called from main script after model run is complete, commented out by default • $ARCROOT/restart contains the latest full set of restart files • ccsm_archive copies full set of restart datasets into $ARCROOT/restart after each run • ccsm_archive then tars up that restart set into the $ARCROOT/restart.tars directory • These tar files can be large, regular clean up required

Restart file management (2) • ccsm_getrestart • In scripts/tools • Called from main script before model run starts, commented out by default • Copies the latest set of restart files from $ARCROOT/restart to the appropriate directories • To “backup” model run to previous model date • Assumes both ccsm_archive and ccsm_getrestart have been active in the main script • Delete all files in $ARCROOT/restart • Untar an $ARCROOOT/restart.tars file into $ARCROOT/restart • Resubmit

Auto-Resubmit • RESUBMIT file in scripts/$CASE directory • contains a single integer • If the integer is >0, main script resubmits itself and decrements the integer • Runaway jobs • FIRST! set value in RESUBMIT file to 0 • Attempt to kill running jobs

Production • Modify coupler namelist in cpl.setup.csh, set run length and restart frequency, turn down diagnostic frequency, set info_bcheck to 0. • Run a startup, hybrid, or branch case $RUNTYPE • Transition to continue $RUNTYPE • Turn on archiving, harvesting, and ccsm_getrestart • Edit RESUBMIT file to initiate auto-resubmission

Monitoring a run • Monitor the batch jobs using llq, bjobs, qstat • Verify that runs complete successfully, check for timing information at the end of a log file • Tail -f $EXEROOT/cpl/cpl.log* • If runs are not succeeding, • tail each log file • grep for ENDRUN in atm and lnd log files • Check stdout and stderr files for component messages or system messages • Look for core files in $EXEROOT/$model • Look for zero length files in $EXEROOT/$model • Check email

Modifying source code • Modifying files in the ccsm models directory is not recommended • Create directories under scripts/$CASE • src.atm, src.lnd, src.ocn, src.ice, src.cpl • Copy subset of model source code to these directories and modify it • Has highest priority with respect to build • Benefits include • Release source code remains unmodified and available • Allows implementation of case dependent code modifications

Multiple Machine Support • Should run on blackforest, babyblue, and ute “out of the box” • “Other” machines include seaborg, nirvana, eagle, falcon, cheetah • Supported platforms are indicated in $OS, $SITE, $MACH, $ARCH environment variables in the main script • See also scripts/tools/test.a1.mods.$MACH for suggested changes to test.a1.run for “other” machines.

Running on a “New” Machine • Main script • Set batch queue commands • Add new $OS, $SITE, $MACH, $ARCH options • Set standard CCSM path names, $CSMROOT, … • Harvester submission issues • Set data movement variables, $LMSINP, … • Harvester script • May require modification • Tools • May need to modify ccsm_msread, ccsm_mswrite • Build • Modify models/bld/Macros.$OS file

ccsm_joe • Created by main script • Updated every time the main script runs • Case dependent • Records important ccsm environment variables • Can be “sourced” by other scripts to inherit ccsm environment variables

Interactive/Batch Issues • Can run main script interactively • Typically used to build and pre-stage initial data • Uncomment “exit” command in main script to stop the script before script starts ccsm execution • Batch environment highly site dependent • NQS • Loadleveler • LSF • PBS

Common Errors (1) • Model won’t build • Try rebuilding clean • Remove all obj directories, these are $OBJROOT/model/obj which is normally equivalent to $EXEROOT/model/obj • When rebuilding, make sure $SETBLD is true in main script • Model won’t continue due to restart problem • Determine cause of problem; quota, hardware, script, zero length files, rpointer problems • Fix if possible • Back up to latest “good” restart dataset • Rerun

Common Errors (2) • Ice model stops due to mp transport error • Double ndte in ice.setup.csh ice model namelist • Back up to latest “good” restart dataset • Run past previous stop date • Reset ndte value • Ocean model non-convergence • Add about 10% to the number of model timesteps/hour in ocn.setup.csh, DT_COUNT • Back up to latest “good” restart dataset • Run past previous stop date • Reset DT_COUNT • Non-convergence on first timestep is special case

Tools • Under scripts/tools • ccsm_getfile : hierarchical search for file • ccsm_getinput : hierarchical search for input file • ccsm_msread : copies a file from local mass store • ccsm_mswrite : copies a file to local mass store • ccsm_checkenvs : echo ccsm environment variables, used to created ccsm_joe • ccsm-getrestart : copies restart files from $ARCROOT/restart to appropriate $EXEROOT and scripts/$CASE directories

Performance • This is complicated! • Issues • Performance of components and system as a function of resolution and configuration • Scalability of individual components, scaling efficiency of individual components • Task/Thread counts • Components sharing nodes, overloading nodes with multiple components, overloading threads, overloading tasks • Load balance of coupled system

Component Timings

CCSM Load Balancing 40 ocean 32 atm 16 ice 12 land 04 cpl 104 total processors 53.2 8.6 40.4 6.2 15.0 9.4 3.0 10.0 10.0 5 3 2 55 Timings in seconds per day

Component/Hardware layout • Machine, set of nodes • Nodes, group of processors that share memory • Processors, individual computing elements • General rules • Do not oversubscribe processors, place only 1 MPI task or 1 thread on each processor • Minimize the number of nodes used for a given component and processor requirement • Multiple components can share a node as long as there is no oversubscription of processors • Test several decompositions, layouts, task/thread combinations to try to optimize performance

Summary • CCSM is a complicated multi-executable climate model, expect there to be “spin-up” time • CCSM is a scientific research code • There are many possible components, configurations, platforms, and resolutions; we are unable to test everything • Users are responsible for validating their science • NCAR can help with software/configuration problems, ccsm@ucar.edu • Please report bugs, fixes, improvements, and ports to new hardware, so we can incorporate those changes! ccsm@ucar.edu

Running CCSM

Running CCSM

Presentation Transcript

The SciDAC2 CCSM Consortium Project

CCSM Tutorial

CCSM/RACM Update

SciDAC CCSM Consortium

CCSM BGC Diagnostics

CCSM DATA MANGEMENT POLICY

CCSM Software Engineering

COSP implementation in CAM/CCSM

CCSM Software Engineering Update

Running the Community Climate Simulation Model (CCSM) at NERSC

Using the CCSM GUI

CCSM CAM2 Tropical Simulation

Isotopes in the CCSM

CCSM Software Engineering

CCSM Technology Resources

CCSM OCEAN MODEL

CCSM Ocean Model (POP) :

SciDAC CCSM Consortium

ESMF adoption in CCSM

CCSM Tutorial

Running CCSM

Overview of the CCSM