800 likes | 817 Views
Learn about the p-TOMCAT model, its parallelization, code changes, and more. Understand the basics of a chemical transport model and the benefits of parallel computing. Access the p-TOMCAT website for documentation and performance information.
E N D
p-TOMCAT training course Glenn Carver Centre for Atmospheric Science, University of Cambridge p-TOMCAT training course
What you will hear about on this course • p-TOMCAT • How to run it • How it works in parallel • How to make code changes • ASAD • How it works • How to make changes p-TOMCAT training course
p-TOMCAT basics • In this section you will hear about: • What kind of model p-TOMCAT is p-TOMCAT training course
Basics: History lesson • The parallel p-TOMCAT model is derived from TOMCAT, the tropospheric offline chemical transport model (CTM) • TOMCAT was developed over the recent years by a number of people at the Centre for Atmospheric Science at the University of Cambridge • TOMCAT itself originally derived from SLIMCAT; the stratospheric offline CTM, developed originally by Martyn Chipperfield (now at Leeds). p-TOMCAT training course
Basics: What is a chemical transport model? • It’s a global three-dimensional model • A CTM takes wind, temperature and humidity fields as input and transports tracers around the globe • Wind and temperature input may be from meteorological analyses or from another model • Global circulation models (e.g. Unified Model) are quite different; these compute their own wind and temp. fields • CTMs are very good for comparisons with observations • They are not so good for coupled chemistry-climate research p-TOMCAT training course
Basics: Why parallel TOMCAT? • High Performance Computing assumes many processors • Had no choice but to parallelize TOMCAT • p-TOMCAT is able to use multiple processors to: • Reduce execution time for long integrations • Run at higher resolutions; horizontally and vertically • We can also use the increase in compute power to increase the complexity of the model e.g. addition of more chemistry, better numerical schemes p-TOMCAT training course
Basics: What’s in p-TOMCAT • p-TOMCAT is made up of different modules • Each module represents a physical process • p-TOMCAT comprisesabout 40,000 linesof fortran Clouds Transport Chemistry Aircraft Lightning Emissions PBL ? Strat p-TOMCAT training course
Basics: More information • p-TOMCAT website: http://www.atm.ch.cam.ac.uk/~tomcat/ • Includes: • Job scripts • Documentation • Test results • Performance information p-TOMCAT training course
p-TOMCAT grid • In this section you’ll learn more about: • The grid the model uses • The parallelization of the model p-TOMCAT training course
p-TOMCAT resolutions • Use Gaussian grids (from spectral models) • Longitudes are regular; latitudes are nearly so • Typical resolutions are: • T21 : 64 lon x 32 lat (5.6 degrees) • T42 : 128 lon x 64 lat (2.8 degrees) • T106 : 320 lon x 160 lat (1.1 degrees) NLON = (3M + 1 or 2), NLAT= NLON / 2 • TOMCAT is not a spectral model so you should not say you are running the model at: T21, T42 etc. Use the grid resolution p-TOMCAT training course
How does it work in parallel? • There are several paradigms for parallel computing • p-TOMCAT uses domain decomposition • Each cpu runs a copy of the model • Each cpu works on a lat-lon portion of the globe • But! More processors does not always imply more performance p-TOMCAT training course
Domain decomposition • Each processor gets same size “patch” Longitude:LON nlonmx nlatmx Latitude:LAT p-TOMCAT training course
Domain decomposition • Total number of grid points: LON x LAT x NIV (mod_paradi.f90) • Each processor has : NLONMX x NLATMX x NIV • Where NLONMX = LON / NPROCI • And NLATMX = LAT / NPROCK • NPROCI is no. of cpus along longitudes, NPROCK along latitudes (mod_slcons.f90) p-TOMCAT training course
Domain decomposition • Problem comes when code requires values from other processor’s gridpoints • E.g. U(i+1) - U(i-1) • A processor must communicate with its neighbour CPU 1 CPU 2 p-TOMCAT training course
Halos • We deal with this by creating ‘halos’ rather than communicate whenever it would be needed in the code • A halo is a copy of data, it is not computed by the processor Halo of cpu2 Halo of cpu 1 CPU 1 CPU 2 p-TOMCAT training course
Halos • Halos are exchanged when interior points are needed by code • Different halos are required by different algorithms • Advection code needs N,S,E,W + special treatment at the poles • Flight track code need N,NE,E,SE,S,SW,S,NW • Tropopause calc needs N,S,E,W • However, halos introduce communication costs and should be avoided or minimised as much as possible p-TOMCAT training course
Halo array example • An array with a halo might be declared:ozone(0:nlonmx+1,0:nlatmx+1) • But! It is always the case that the processor computes ozone(1:nlonmx,1:nlatmx) • The halos are copies and only for reading: cpu1:ozone(nlonmx+1,:) = cpu2:ozone(1,:) Halo of cpu2 Halo of cpu 1 CPU 1 CPU 2 p-TOMCAT training course
Advection halo arrays • Advection scheme halos are more complicated • Arrays that need halos are declared as e.g:SM(nimn:nimx,nkmn:nkmx,niv)where nimn = -nhalmx + 1 nimx = nlonmx + nhalmx nkmn = 0 nkmx = nlatmx + 1 • where nhalmx is greater than 1 and depends on the resolution • Next we’ll see why advection needs nhalmx > 1 p-TOMCAT training course
Model timestep • Numerical stability normally requires: ∆t < ∆x / Umax • This presents a problem approaching the poles. As ∆x decreases, so must ∆t • This is the well known “pole problem” p-TOMCAT training course
Polar latitudes • To ensure stability for the x-direction advection, the model groups grid cells together near the poles. N S What’s the deliberate mistake? p-TOMCAT training course
Polar latitudes • Grouping depends on: timestep, Umax and resolution • Grouping implies model loses zonal resolution approaching the poles (kind of) • Amount of cells grouped increases with resolution • Careful: if timestep is not reduced as resolution is increased, model will start grouping at mid or low-latitudes. Check job log! p-TOMCAT training course
Polar latitude halos • Cell grouping implies bigger halos at these lats • E.g. if 8 gridpts are grouped into 1, it means we’ll need 8 gridpts either side of our processor • This will increase cost of communication at poles • Hence the need for nhalmx. This variable will change with different resolutions and timesteps but we want it as small as possible and the timestep as big as possible! • Optimum values of nhalmx, nproci & nprock have been set in tom.build. Check calc_slcons program output on website for other configuration possibilities based on choice of ∆t p-TOMCAT training course
p-TOMCAT: Parallel computing • In this section you’ll learn more about • MPI • How the model uses MPI • When you should use it if adding code • Parallel performance p-TOMCAT training course
Message passing • p-TOMCAT uses MPI : Message Passing Interface • MPI is an international standard and available on all HPC systems • MPI assumes processors are completely separate apart from a communications link • In practice, MPI will use a shared memory to speed up comms but this is transparent to the user CPUs Memory p-TOMCAT training course
MPI • MPI requires the addition of subroutines calls to do the communication • You will need to use MPI if: • Your scheme needs neighbouring processor gridpoints (e.g. gradients, local searchs of gridpoint values) • Zonal means • Global sums • Reading a file and broadcasting to all processors • You do not need it for any operation in the vertical. Each processor has all vertical levels for each gridpoint it deals with p-TOMCAT training course
Parallel performance • Rarely get perfect parallel performance due to communication costs, I/O and differences in time through code (load imbalances) • Speedup: S = T1 / Tp • Where T1 is time on a single processor, Tp is time on p processors • Efficiency: E = S / P • e.g. efficiency of 50% means on average we only used half the processors during the run. Ideally we want > 60-70%. p-TOMCAT training course
TOMCAT performance Efficiency: T21 p-TOMCAT training course
Load balancing • Another reason for efficiency to reduce as processors increase is because of load balancing • The time through the code depends on various factors: • Day / night affects time through photolysis and chemistry code • Polar night / day affects time through photolysis & chemistry code • Polar x-advection costs less because cells are grouped • Presence of convection alters cost of convection code • The net effect is that some processors might be idle whilst waiting for others to catch up • This is a difficult problem to solve. It is a particular problem at high resolution and may well need restructuring the grid for the model to work efficiently at T106 and above on 128+ processors p-TOMCAT training course
Reproducibility in parallel • Getting reproducible results is a big issue for parallel models because A + B /= B + A on computers • Computing a zonal mean requires each processor to compute a mean for nlonmx pts and then sum those means across all processors • Splitting the domain differently e.g. 4x4 or 2x8 will therefore change the result • p-TOMCAT results were sensitive to this. V1.0 uses special subroutines for zonal & global means to avoid this which you must use p-TOMCAT training course
Reproducibility across computers • p-TOMCAT results are also sensitive to the compiler used • Optimization levels O2 and O3 on green give the same results • You will get differences between green & newton Relative difference in zonal meanozone between runs on green .v. newton p-TOMCAT training course
How to use p-TOMCAT • In this section, you’ll find out: • How to compile and run the model • How to add in new code • How to control what the model does p-TOMCAT training course
Getting started at Manchester • Suggest you create an experiment directory in /santmp/user (but remember files on /santmp are deleted if not used > 14days) • /hold/year/user is the tape store where results can be saved • Use tar and gzip to package up files into one file for more efficient storage e.g. tar cvf expt.tar myexptdir; gzip expt.tar and then copy it to /hold p-TOMCAT training course
Compiling p-TOMCAT • We compile (build) and run p-TOMCAT in separate steps • Download the tom.build script from the p-TOMCAT website to your experiment directory e.g. /santmp/emgdc/run1 • First make it executable: chmod u+x tom.build • Then: ./tom.build on wren in the expt directory.It will create a directory ‘src’ with all the code in and leave an exectuable ‘tomcat’ in the expt directory p-TOMCAT training course
Makefiles • tom.build will put a ‘Makefile’ in the ‘src’ directory for you • A ‘Makefile’ is a way of telling the compiler what options to use and what order the files should be compiled • If you need to change the Makefile, please ask if you are not sure how they work. You only change it if: • Altering the compiler options • Adding new files of code to the model • We use a parallel compilation, compiling several files at once • On Green use the command : pmake tomcat • On newton use the command: make -j 4 tomcat p-TOMCAT training course
tom.build: changes you might make • Set the model resolution is all you might need to do if you just want to run a ‘vanilla’ flavour of the model • Other changes you might want to make: • Model version number • Changes to domain decomposition • Changes to number of tracers, species in chemistry • tom.build creates a small file called ‘config’. Do NOT edit it! It’s used to pass model config to tom.run script RES=21 NIV=31 These parameters are at the top of the job p-TOMCAT training course
Editing code: nupdate • To make changes to the model code we use ‘nupdate’: a preprocessor. It takes input of the form: *id run1 *d mod_paradi.11,12 integer, parameter :: lon = 320, lat = 160 *i mod_paradi.14 ! This is just a comment *b mod_paradi.14 ! This comment comes before the other comment • To see the line numbers, check the tomcat listing in /ohome/emtomcat/public/lib/p-tomcat.1.0.list • tom.build runs nupdate twice, for tomcat & ASAD p-TOMCAT training course
Editing code: adding your own changes to subroutines • Put all your nupdate changes in a file(s), rather than include the changes in tom.build directly. Much easier to manage this way. • Edit tom.build to read:cat > tomcat.mods <<EOF*read newcode.up*read newcode2.up • To alter the ASAD code, similarly add *read after the linecat > asad.mods << EOF • Remember to use f90, free source format p-TOMCAT training course
Adding new code • If you have hundreds of lines of new code, nupdate can be cumbersome and error prone (although if you want to..). • I would suggest you: • Run tom.build once to create the src directory & Makefile • Edit the code directly (copy the original first!) • Edit the Makefile if you need to • Run ‘pmake’ (or make -j 4 on newton) by hand • Disadvantages is that it’s harder to move to new versions and code management requires more effort p-TOMCAT training course
Adding new files of code • If you need to add new .f90 files: • Consider using f90 modules. They have a number of advantages. • You must edit the Makefile: • Add your .f90 file to the list of files to be compiled • Add the dependency (if any) to the makefile. • New code or bugfixes should be sent to Glenn for inclusion • It will be expected that the code will be properly tested • Test results should accompany code • For more details email or see Glenn or Nick p-TOMCAT training course
Compiling the code • Normally you would not change the compiler options in the Makefile apart from when testing. Uncomment the second set of options:F90FLAGS_IRIX64=-64 -r8 -extend_source -O2 \ -DEBUG:trap_uninitialized=ON:verbose_runtime=ON:div_check=3 \ -LIST:all_options=ON -I/opt/mpt/mpt/usr/include## use these while developing#F90FLAGS_IRIX64= -g -64 -r8 -extend_source \# -I/opt/mpt/mpt/usr/include \# -DEBUG:trap_uninitialized=ON:verbose_runtime=ON:div_check=3# -check_bounds • -check_bounds should be used for brand-new code but will slow the model down a lot. Just use it once. • After editing the Makefile, do a make clean followed by a pmake tomcat to recompile the model p-TOMCAT training course
Running p-TOMCAT • Download the tom.run script from the website to your experiment directory • You MUST change the line:EXPT=<your experiment directory> • To run either submit as a batch job by:bsub < tom.run (not bsub tom.run!!)or./tom.runto run interactively on wren (max. of 4 processors only) p-TOMCAT training course
Files you need • To run p-TOMCAT you will need the following files in your experiment directory: • chch.d - this lists the chemical species • ratb.d, ratt.d, ratj.d - the reactions used • depvel.d - deposition velocities • henry.d - henry’s law coefficients (wet dep.) • An binary file with initial fields for model’s tracers • These are available off the website or on wren in the emtomcat account • The model reads other files which are held in the emtomcat account. You can supply your own if you wish by changing the namelist variables which set the directory the model looks in for these files. These files will be moved for v1.1 p-TOMCAT training course
Reaction rate data • The reaction coefficients used in the rat?.d files are collated from various sources: • IUPAC kinetic data (www.iupac-kinetic.ch.cam.ac.uk) • JPL kinetic data (jpldataeval.jpl.nasa.gov) • Master Chemical Mechanism (MCM) (www.chem.leeds.ac.uk/Atmospheric/MCM/mcmproj.html) • Note! Some reaction rates do not have simple Arrhenius forms and are dealt with specifically in the ASAD code (see bimol.f & trimol.f) • p-TOMCAT uses rates current at 2000 • TOMCAT runs on the Fujitsu used different rates so no long runs have yet been done using these • The ratefiles will be updated periodically. Due for an update soon p-TOMCAT training course
p-TOMCAT initial files • New initial files can be created from past runs • Very early runs had low methane so use more recent runs by Fiona (ACTO), Nick (POET) or Richard (MOZAIC) • Note! Reaction rates have been altered since these runs so allow time to spin the model up • Programs exist to manipulate initial files • Change the resolution • Change the date • Add more tracers • Default initial file available for 1/3/1997 p-TOMCAT training course
Analysis files • p-TOMCAT currently uses modified ECMWF operational analyses which are stored on wren in /v/lrkd/ECMWF/T42 • Operational analyses changed levels: • 31 levels up to March 1999 • Then 50 levels up to Oct 1999 • 60 levels thereafter • We plan on using ERA-40 analyses to avoid inconsistencies in the operational analyses at some point • Though convection is known to change dramatically • It is possible to run the model with higher resolution analyses p-TOMCAT training course
Anatomy of the tom.run script • Check the batch queue options at the top:#BSUB -J p-tomcat # job name#BSUB -o tomcat_log.%J # job log file#BSUB -m green # machine to run on: 'green', 'fermat’, 'newt64i’#BSUB -N # mail me when done#BSUB -n 16 # no. of processors#BSUB -W 2:00 # run time (wall clock time hrs:mins) • You can override any of these on the command line:bsub -m fermat < tom.run • No. of processors must matchnproci*nprock inconfigfile • And don’t forget to change:EXPT = /santmp/emgdc/run1 • Remember we are charged on the no. of cpus used! p-TOMCAT training course
Namelist variables • The model uses namelists (f90) to control its options, unlike old TOMCAT where you had to modify the code • There are three namelists used by the model &switches / &chem_switches / &pbl_switches / • Normally you will only need to change “switches” to control what the model does p-TOMCAT training course
How namelists work • A variable in a namelist has a default value in the model and therefore need not be changed at all • Namelist variables can have comments after:&switches nsteps = 480, ! Number of timesteps • The model will check user settings are sensible • For a full description of the switches, see mod_switch.f90. • For further information about namelists, see a f90 book! p-TOMCAT training course
Main model switches • The main model switches you might want to change are:&switches initial_file = ‘$EXPT/init_t${RES}_97030100’ dt0 = 1800.0, ! Dynamical timestep (secs) nsteps = 4, ! Number of steps in the run nso1 = 12, ! Frequency of model output in steps nfrf = 0, ! Frequency at which restart files are created ifrqnc = -14, ! Frequency at which new netcdf files are created • Then there are switches to turn off components such as the emissions, chemistry etc. See mod_switches.f90 for more details • Note that the frequency at which the netCDF files are created depends on the resolution and the frequency at which you output • At T21 with 6hrly output, create a new netcdf file every 14 days • At T42 with 6hrly output, create a new one every day • Don’t create netCDFs bigger than 2Gbytes p-TOMCAT training course
Types of model output • Restart file • This is a full precision (real*8) dump of the model’s main variables. • You can use this file to restart the run as if the run had not been stopped • You can also use a restart file to create an initial file for the model • Vertical diffusion scheme (PBL scheme) physical restart file • Netcdf output • Reduced precision (real*4) for plotting / diagnostics (Ferret, GrADS, IDL) • Usually contains all the model’s main variables & other diagnostics fields • You can add additional fields to this file by altering the model code • Binary output (PDG) • Full precision, fortran binary output file with the main model variables • Old versions of TOMCAT used this as the main output file, but netcdf files are smaller and easier to use • Still useful for doing exact comparisons between different model versions or diagnostics that need high precision • Can also be used to create initial files for new runs p-TOMCAT training course