1 / 14

Infrastructure Group Status, Plans, Next Emphasis, Blockers

Infrastructure Group Status, Plans, Next Emphasis, Blockers. Rob Jacob Chengzhu Zhang E3SM 2019 Spring Meeting March 19, 2019. Infrastructure Group 3 major areas of concern.

fifi
Download Presentation

Infrastructure Group Status, Plans, Next Emphasis, Blockers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infrastructure GroupStatus, Plans, Next Emphasis, Blockers Rob Jacob Chengzhu Zhang E3SM 2019 Spring Meeting March 19, 2019

  2. Infrastructure Group 3 major areas of concern • Develop, maintain and support software that is needed for E3SM but is not part of the main prognostic models.  This includes analysis and diagnostic software, the CIME suite of software (coupler and Case Control System) and other scripts and programs used to manage data and simulations.  • Maintain data sets themselves: making sure they can be categorized, accessed and transferred.  This includes model input and output data and observational datasets used in analysis.    • Define and document the process and procedures used in software development within the E3SM Project. Everything we do should help make the model development, simulation and analysis happen.

  3. Status: Testing • System testing continues (with baselines) on core set of machines. • Integration and testing process running (mostly) smoothly. • Slowly adding tests for new capabilities. • Continue to strike balance among expense of testing/need for overnight results/availability of machine time.

  4. Status: Model Development • Roughly 10 PRs a week getting merged to master. • Accepting changes for 2.0 (and further versions if appropriate) • Maint-1.0 branch – stable. fixed a bug for reading DECK v1 restart sets • Created April 20, 2018 • Maint-1.1 branch – in use for v1 BGC simulations • Created Aug 2, 2018 • Maint-1.2 branch – just started. For v1 cryosphere simulations • Created Feb 29, 2018

  5. Status: Communication tools • Confluence: Recently re-organized the E3SM Documentation space. • Should now be your one-stop for documentation of any process, simulations, how-to, project plans in E3SM • Old ACME Documentation space will be for truly out-of-date, historical-interest-only documentation. • All Phase 1 spaces will have still-useful content moved to E3SM Documentation and then made read-only (so far: SE/CPL, Land) • Next: uniform design for home spaces for Phase 2 groups. • JIRA: process is stable. Always looking to improve. • Slack: 137 accounts, 2000 messages/week • e3sm.org tutorials made by IG members. Uploaded.

  6. IG Code releases (and very brief list of features) • NCO 4.7.8, 4.7.9: CMIP6 support including interpolation • MPAS-Analysis 1.2.1: surface BGC, transect plots, iceberg concentration • E3SM-diags 1.6.0: save more provenance, analysis of time-series files. • CIME Case Control System 5.7.5, 5.7.6: more robust xml file handling, better error messages, friendlier “xmlchange”. • Processflow 2.2.0: call MPAS-Analysis, debugging.

  7. What is “processflow” • A command line utility to automate the process of running post processing jobs and generating diagnostics from E3SM model output. • The tool takes a single configuration files and runs a series of data transfer and processing jobs on any amount of model output, running the jobs on any number of set lengths. • Once the diagnostics complete, the tool transfers the plots to a hosting directory, and emails links with the completed output to the user. • Considered feature complete. Looking for users and more use cases. • Discussion: Tuesday evening, 7pm

  8. Data update • V1 DECK publication should be finished by end of this month • 3 large epics are tracking progress • Publish PI control for CMIP6. Define formulas, processes. • Publish all other DECK data for CMIP6. Using processes determined for PI control • V1 non-CMIP6 publication. On ESGF but not in CMIP6 format. More variables. • Found some data corruption on data retrieved from tape. Re-running. • Discussion: Wednesday evening, 7pm.

  9. Recent code walkthroughs • Used 5 of regular scheduled telecon times to do overview of all our python codes • Typical use, documentation, output, code structure, testing, dependencies • MPAS-Analysis, E3SM-diags, processflow, LIVVkit, CIME CCS • Will inform discussion on common standards for a our python codes and future development plans • Recorded presentations and notes in IG meeting notes for 2/14 to 3/14.

  10. Machine changes • NERSC “Edison” is going away May 13 (pushed back from March 31) • NERSC “Cori” will become the “externally supported” machine we point people to at e3sm.org • OLCF Summit only used for ECP “E3SM-MMF” project. Titan goes away later this year (?). Rhea updated in 2020 • New E3SM-only machine: CompyMcNodeFace or just “Compy”. • 460 dual-socket Intel Skylake nodes; 40 cores, 192 GB per node • Intel OmniPath interconnect, 1 PB Lustrefilesystem • 50% E3SM, 35% RGMA, 15% ESM other. • Staff from Infra and Perf will port model and set optimum pe-layouts • Machine POC: Bibi Mathew

  11. Plans • Python3: need to support it in all our python tools • Container development: all of E3SM-unified, E3SM model development environment. • Machine changes: • replace edison with cori as “public” supported machine. Edison goes away May 13 • Port model to CompyMcNodeFace. Set up regular testing • Proven: Focus on populating Proven data base with correct information from a case. Python tools to add completed and in-progress simulations. • Data: consolidate observation data used in analysis. • Continue to support DevOps

  12. Plans: V2 development timeline • 30 Jun 2019: feature freeze for features to be used in v2 simulations. • PRs must be posted and ready to be integrated. • E3SM v2-alpha made after PRs are integrated. • After PRs integrated, start component-level tuning (F-cases, I-cases, G-cases) • 30 Sep 2019: Finish component level tuning. Start coupled tuning. • Coupled cases should pass a smoke test. • E3SM v2-Beta made. • All initial/BC/mapping files should be finished, in inputdata server, use cases updated to use. • Jan 15, 2020. Coupled tuning finished. Start v2 simulations runs • May have additional beta tags after this. • development during this time can't change answers for in-progress v2 simulations. • But can change answers in other cases. • Can accept limited v3, v4 changes subject to above constraint • 24 Mar 2021 v2 data and model release. All simulation campaigns.

  13. IG schedule this week

  14. Final thoughts • IG should be working on tools you want/need to use • If there are any problems with IG software, always FILE AN ISSUE. • Only way to let others experiencing the problem to know its been reported. • Group leads can prioritize the work and track progress. • See the “IG Software” summary page which lists Infrastructure software and, after one more click, the link to the github issues page. • No real “blockers” but over-subscribed staff means development is always slower then planned.

More Related