340 likes | 475 Views
CGAM Running the Met Office Unified Model on HPCx. Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk www.cgam.nerc.ac.uk/~paul. Overview. CGAM : Who, what, why and how The Met Office Unified Model Ensemble Climate Models High Resolution Climate Models Unified Model Performance
E N D
CGAMRunning the Met Office Unified Model on HPCx Paul Burton CGAM, University of Reading Paul@met.rdg.ac.uk www.cgam.nerc.ac.uk/~paul
Overview • CGAM : Who, what, why and how • The Met Office Unified Model • Ensemble Climate Models • High Resolution Climate Models • Unified Model Performance • Future Challenges and Directions
Who is CGAM? N.E.R.C. Distributed Institute for Atmospheric Composition Atmospheric Chemistry Modelling Support Unit NERC Centres for Atmospheric Science Centre for Global Atmospheric Modelling Centre for Ecology and Hydrology Centre for Polar Observations and Modelling National Institute for Environmental e-Science Tyndall Centre for Climate Change Research British Antarctic Survey Centre for Terrestrial Carbon Dynamics Southampton Oceanography Centre Proudman Oceanographic Laboratory British Geological Survey Data Assimilation Research Centre Environmental Systems Science Centre British Atmospheric Data Centre Universities’ Weather and Environment Research Network Facility for Airbourne Atmospheric Measurements University Facilities for Atmospheric Measurement
What does CGAM do? • Climate Science • UK Centre of expertise for climate science • Lead UK research in climate science • Understand and simulate the highly non-linear dynamics and feedbacks of the climate system • Earth System Modelling • From seasonal to 100’s of years • Close links to Met Office • Computational Science • Support scientists using Unified Model • Porting and optimisation • Development of new tools
Why does CGAM exist? • Will there be an El Nino this year? • How severe will it be? • Are we seeing increases in extreme weather events in the UK? • 2000 Autumn floods • Drought? • Will the milder winters of the last decade continue? • Can we reproduce and understand past abrupt changes in climate?
How does CGAM answer such questions? • Models are our laboratory • Investigate predictability • Explore forcings and feedbacks • Test hypothesis
Met Office Unified Model • Standardise on using a single model • Met Office’s Hadley Centre recognised as world leader in climate research • Two way collaboration with the Met Office • Very flexible model • Forecast • Climate • Global or Limited Area • Coupled ocean model • Easy configuration via a GUI • User configurable diagnostic output
Unified Model : Technical Details • Climate configuration uses “old” vn4.5 • Vn5 has an updated dynamical core • Next generation “HadGEM” climate configuration will use this • Grid-point model • Regular latitude/longitude grid • Dynamics • Split-explicit finite-difference scheme • Diffusion and polar filtering • Physical Parameterisation • Almost all constrained to a vertical column
Unified Model : Parallelisation • Domain decomposition • Atmosphere : 2D regular decomposition • Ocean : 1D (latitude) decomposition • GCOM library for communications • Interface to selectable communications library: MPI, SHMEM, ??? • Basic communication primitives • Specialised communications for UM • Communication Patterns • Halo update (SWAPBOUNDS) • Gather/scatter • Global/partial summations • Designed/optimised for Cray T3E!
Model Configurations • Currently • HadAM3 / HadCM3 • Low resolution (270km : 96 x 73 x 19L) • Running on ~10-40 CPUs • Turing (T3E1200), Green (O3800), Beowulf cluster • Over the next year • More of the same • Ensembles • Low resolution (HadAM3/HadCM3) • 10-100 members • High resolution • 90km : 288 x 217 x 30L • 60km : 432 x 325 x 40L
Ensemble Methods in Weather Forecasting • Have been used operationally for many years (is. ECMWF) • Perturbed starting conditions • Reduced resolution • Multi-model ensembles • Perturbed starting conditions • Different models • Why are they used? • Give some indication of predictability • Allows objective assessment of weather-related risks • More chance of seeing extreme events
Climate Ensembles • Predictability • What confidence do we have in climate change? • What effect do different forcings have? • CO2 – different scenarios • Volcano erruptions • Deforestation • How sensitive is the model • Twiddle the knobs and see what happens • How likely are extreme events? • Allows governments to take defensive action now
Ensembles Implementation • Setup • Allow users to specify and design an ensemble exeperiment • Runtime • Allow the ensemble to run as a single job on the machine for easy management • Analysis • How to view and process vast amounts of data produced
Setup : Normal UM workflow UM Job Shell script[poe executable] Fortran Namelists UMUI Data Output Starting data Diagnostics Forcing data Restart data
Setup : UM Ensemble workflow UM Job Job.1 Job.3 Job.2 ect Control Shell script Shell script Shell script Shell script[poe executable] Fortran Namelists Fortran Namelists Fortran Namelists poe UM_Job Fortran Namelists Config N_MEMBERS=3Differences UM_Job ecdt $MEMBERid=…cd “Job.$MEMBERid”Run script Data.3 Data.2 Data.1 Starting data Starting data Starting data Out.2 Out.3 Out.1 Forcing data Forcing data Forcing data Diagnostics Diagnostics Diagnostics Restart data Restart data Restart data
UM Ensemble : Runtime (1) • “poe” called at top level – calls a “top_level_script” • Works out which CPU it’s on • Hence which member it is • Hence which directory/model SCRIPT to run • Model scripts run in a separate directory for each member • Each model script calls the executable
UM Ensemble : Run time (2) • Uses “MPH” to change the global communicator • http://www.nersc.gov/research/SCG/acpi/MPH/ • Freely available tool from NERSC • MPH designed for running coupled multi-model experiments • Each member has a unique MPI communicator replacing the global communicator
UM Ensemble : Future Work • Run time tools • Control and monitoring of ensemble members • Real-time production of diagnostics • Currently each member writes its own diagnostics files • Lots of disk space • I/O performance? • Have a dedicated diagnostics process • Only output statistical analysis
UK-HIGEM • National “Grand Challenge” Programme for High Resolution Modelling of the Global Environment • Collaboration between a number of academic groups and the Met Office’s Hadley Centre • Develop high resolution version of HadGEM (~ 10 atmosphere, 1/30 ocean) • Better understanding and prediction of • Extreme events • Predictability • Feedbacks and interactions • Climate “surprises” • Regional Impacts of climate change
UK HiGEM Status • Project only just starting • Plan to use Earth Simulator for production runs • Preliminary runs carried out • Earth Simulator • Very encouraging results • HPCx is a useful platform • For development • Possibly for some production runs
UM Performance • Two configurations • Low resolution 96x73x19L • High resolution 288x217x30L • Built in comprehensive timer diagnostics • Wallclock time • Communications • Not yet implemented • I/O, memory, hardware counters, ??? • Outputs an XML file • Analysed using PHP web page
HiRes Exclusive Timer • QT_POS has large “Collective” time • Unexpected! • Call to global_MAX routine in gather/scatter • Not needed, so deleted!
HiRes : After “optimisation” • QT_POS reduced from 65s to 35s • Improved scalability • And repeat…
Optimisation Strategy • Low Res • Aiming for 8 CPU runs as ensemble members (typically ~50 members) • Physics optimisation a priority • Load Imbalance (SW radiation) • Single processor optimisation • Hi Res • As many CPUs as is feasible • Dynamics optimisation a priority • Remove/optimise collective operations • Increase average message length
Future Challenges • Diagnostics and I/O • UM does huge amounts of diagnostic I/O in a typical climate run • All I/O through a single processor • Cost of gather • Non-parallel I/O • Ocean models • Only 1D decomposition, so limited scalability • T3E optimised! • Next generation UM5.x • Much more expensive • Better parallelisation for dynamics scheme