1 / 14

Porting MM5 and BOLAM codes to the GRID

This workshop aims to run MM5 and BOLAM weather models on the grid for ensemble forecasting and develop a generic weather model execution framework. It will support deterministic forecasting and easily adopt various other forecast models. The focus is on adopting existing procedures, simplifying execution times, and utilizing high-level tools. The workshop will utilize the GANGA framework for job management and monitoring.

cheryln
Download Presentation

Porting MM5 and BOLAM codes to the GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Porting MM5 and BOLAM codes to the GRID Earth Science Workshop January 30, 2009 – Paris, France The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 211338 Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  2. Goal • Run MM5 and BOLAM models on the grid to perform Ensemble weather forecasting • Develop a generic weather model execution framework • Support for deterministic forecasting • Easily adopt to various other forecast models (e.g. WRF, RAMS etc). Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  3. Target workflow N.O.M.A.D.S NCEP-GFS (USA) Weather models follow a specific workflow of execution Retrieval of Initial Conditions Pre- Processing Model Run Post Processing HTTP • Framework should be able to incorporate different codes for pre/post-processing and model execution. • Parametric configuration of initial data retrieval Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  4. Requirements • Adopt existing NOA procedures for model execution • Hide the Grid as much as possible • Give the feeling of local execution • Simplify existing procedures and improve execution times • Utilize high-level tools that facilitate better quality code and overcome low-level interactions with the Grid • Satisfy specific model requirements • Usage of commercial compiler not available in the Grid • Time restrictions for completing application execution Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  5. Design Approach • Keep existing “command-line” look’n’feel • Re-use and improve existing code base (shell scripts) • Utilize Python language to replace various parts of the existing workflow • Exploit the GANGA framework for job management and monitoring Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  6. Utilized Grid Services • gLite • WMS – Job management • LFC – Data management • MPICH 1.2.7 on gLite sites • Ganga • Developed in CERN. Endorsed by EGEE RESPECT program • Provides a Python programming library and interpreter for object-oriented job management • Facilitates high-level programming abstractions for job management • More information: http://ganga.web.cern.ch/ganga/ Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  7. Implementation Details • MM5 and BOLAM codes compiled locally in UI with PGI Fortran • 3 different binaries produced for MM5, for 2, 6 and 12 CPUs respectively • MPICH also compiled with PGI. MPICH libraries used for MM5 binaries generation • Binaries were packed and stored on LFC • Downloaded in WNs before execution • Include Terrain data • Models are running daily as cronjobs. Notifications are send to users by email • Log files and statistics are kept for post-mortem analysis • Ganga also useful for identifying problems after execution Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  8. Implemented Architecture N.O.M.A.D.S NCEP-GFS (USA) UI UI N jobs UI WMS UI Ganga CE/WN LJM (Python) ModelConfigfile Lead-In/Out (shell script) LFC Binaries and Results SE Workflow Orchestrator (Python) Results Post-Process (shell script) Model Run (shell script) Decode (Shell script) Pre-process (shell script) http mpiexec WN WN WN WN … GRID Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  9. Ensemble Forecasting • Each member is executed as a separate job • 10 members in total, both for MM5 and BOLAM models • Each member separately downloads its initial data from NCEP servers • Whole ensemble execution is handled by a single compound job • Compound job definition, execution and management handled by Ganga constructs (job splitters) • Final stage of forecast production and graphics preparation performed locally on UI Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  10. Initial Performance Results • MM5: Typical execution time: ~2hrs (including scheduling overheads) • Different completion times per member depending on total processors used. • 12 process version takes ~40mins per member but takes longer time to get scheduled in a grid site. • BOLAM: Typical execution time for 10 member ensemble forecast: 60-90mins (including scheduling overheads) • One member takes ~25 minutes to complete in a local cluster with optimized binary. Ensemble would take ~4 hrs locally • Overall, non uniformity of completion times due to Grid resources (un)availability Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  11. Adopting the framework for different models • Models that implement similar workflows should be easy to adopt • Ultimately the user should only provide: • The four workflow hooks • Decode • Pre-process • Model run • Post-process • Model configuration file(s) • Definition of different initial data sources • Forecast region • Terrain data • Model binaries stored on the LFC Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  12. Problems/Pending Issues • Problems with initial data • NCEP servers sometimes down or cannot generate requested files • Grid resources availability imbed timely execution • Not all members manage to complete on time • Some may still be in scheduled state when time expires • Grid robustness and predictability • Jobs may be rescheduled while running in different sites for no apparent reason • Central grid services might be unavailable (WMS, LFC) • MM5 sensitive to execution environment • Dying processes while model in parallel section • MPI notoriously not well supported by grid sites (some sites “better” than others) Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  13. Future Work • Application is still in pilot phase • Planned to execute “super-ensemble” runs by April • Multi-model multi-analysis ensemble forecasting combining results from MM5, BOLAM, NCEP/ETA, NCEP/NMM • Presentation to be delivered in UF4 in Catania. • Anticipating more resources and better support from existing once. Earth Science Workshop, Paris FRANCE, 30 Jan 2009

  14. Questions Earth Science Workshop, Paris FRANCE, 30 Jan 2009

More Related