100 likes | 116 Views
This paper discusses the motivation behind self-describing Earth system models and introduces the Common Information Model (CIM) and Earth System Documentation (ES-DOC) collaboration, which develops tools for automatic model self-documentation. The paper also presents the Earth System Modeling Framework (ESMF) as a useful tool for extracting metadata directly from model code.
E N D
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG September 16, 2014 July 22, 2014
Motivation Why do we want self-describing Earth system models? • Distinguish the different models in model intercomparison projects • Reproducible results allow for greater experimentation with model parameters • Interoperability of components requires understanding of inputs and outputs Vision: Coupled systems of Earth system modeling components that automatically self-document during simulation runs • The Common Information Model (CIM) is a schema for describing climate models, experiments, simulations, … • The Earth System Documentation (ES-DOC) collaboration is developing CIM based tools to address the issue of automatic model self-documentation • One aspect is metadata creation • The Earth System Modeling Framework (ESMF) can be useful in extracting metadata directly from model code CIM documents Experiment Data Grid Model Platform Simulation Ensemble Quality
Previous Study on Self-Describing Models • Turuncoglu, U., Dalfes N., Murphy S., DeLuca C., Toward self-describing and workflow integrated Earth system models: A coupled atmosphere-ocean modeling system application, Environmental Modelling & Software, 39, (2013), 247-262 • Three sources of metadata: workflow (Kepler), application (coupled system of two Earth system models (ROMS, WRF) run by CESM driver), and computing system • System: gather system-level info with Perl wrapper script and output to XML • Info about operating system (type, kernel version and patches) and processor architecture • Application: metadata gathered in two steps, Python scripts and ESMF calls • Model-specific build information gathered by Python scripts and output to XML • Details of the compiler and flags, defined environment variables and their values, and component-specific data such as executable path and name, and time stamp • Modeling system uses ESMF to gather component and field level metadata and output to XML • Component info: programming language, physical discipline, institution, author • Field info: standard name, long name and units of fields transferred among model components • Workflow: contains provenance-specific components to record information on the evolution of the workflow
Metadata Creation Different types of climate model metadata may be gathered in different ways: • Questionnaires - forms used to gather metadata from people who define and run model experiments • Scripting tools - libraries used to harvest and archive metadata from the build environment and model configuration files • Workflows - problem solving environments used to simplify the task of running model simulations, which contain tools to gather metadata about the sequence of steps necessary to complete the experiment • Frameworks – software used to build and/or wrap model code, which contains built-in tools to gather metadata directly from the code • ES-DOC provides both questionnaires and the pyesdoc scripting library • ESMF can offer access to information contained in the model code
ESMF metadata strategy • ESMF organizes Earth system modeling components into a hierarchical structure • A similar hierarchical structure exists within a model component • Metadata is represented by expandable Attribute objects attached to the objects at each level of the ESMF hierarchy • The Attribute hierarchy mirrors the ESMF object hierarchy • Attributes can pull information out of the objects, and provide information about the way that these objects are connected together • Components can be ESMF Components or virtual metadata-only Components A = Attributes A Driver A Component State A Grid A FieldBundle A Coupler A Field A Field A A A Component Component Component
ESMF Attributes • Standard ‘packages’ of Attributes are available: • Automatically set up metadata for a variety of different standards and formats (e.g. Scientific Properties, Platforms, Responsible Parties…) • Make internal Grid information (coordinates, etc.) available as metadata • GRIDSPEC (version 02/09/2012) grids created from file can be represented as CIM GridSpec documents • Attribute can write CIM documents in XML format, which can be validated against standard CIM schemas from 1.5 – 1.7.1 • There are notable benefits to the framework approach to metadata: • Flexibility - encoded metadata moves with the code • Maintainability - encoded metadata doubles as code documentation • Robustness - a single source of input reduces user errors • Limits effort – information drawn from the code doesn’t need to be reentered by the user (e.g. relationships among objects)
Use Case • The Community Earth System Model (CESM) (http://www2.cesm.ucar.edu/) has a number of ESMF enabled model components, such as: • The Community Atmosphere Model (CAM) • The Community Land Model (CLM) • The Parallel Ocean Program (POP) • The CAM atmosphere component has encoded scientific properties using the ESMF Attribute approach • Metadata from CAM and other components is written to an XML file using a driver-level call to ESMF_AttributeWrite() In the model components: type(ESMF_Attribute) :: attpack attpack = ESMF_AttributeAdd(CAM, convention=“CIM 1.7.1”, purpose=“ModelComp”) call ESMF_AttributeSet(CAM, name=“shortname”, value=“CAM”) ... In the driver: call ESMF_AttributeWrite(CESM_Component, convention=“CIM 1.7.1”, purpose=“ModelComp”)
Use case (cont.) CESM Driver CESM Coupler Ocean Component (POP) Land Component (CLM) Atmosphere Component (CAM) Grid Component Properties Component Properties Component Properties This metadata comes directly from the CESM Component code!
A Complete Picture • Run a model simulation • ESMF enabled model components write out code-based CIM documents • Models (including scientific properties) and Grids • pyesdoc embedded calls write CIM documents • Simulation and Platform • Fill out questionnaire to record the human perspective • Experiment, Ensemble, and Quality • Archive documents in ES-DOC database, available for follow-on tools pyesdoc: info from configuration Questionnaire: info from humans ESMF: info from code Archive Model Experiment Simulation Grid Ensemble Platform Data Quality
Future Directions.. VISION: Coupled systems of Earth system modeling components that automatically self-document during simulation runs Where do we go from here? • Create a metric of CIM properties that ESMF metadata capabilities can naturally generate *in progress • Set up regression testing of the ESMF metadata capability to ensure compatibility with future versions of the CIM *in progress • Create ESMF atom feed for ES-DOC automatic data archival * in progress • Expand the ESMF metadata creation capability • Expand ESMF capability to represent a CIM Grid Document created from UGRID files • Continue to expand ESMF capability to harvest encoded information from internal objects • Add the capability to create Attribute packages automatically from ModelComponent Document Controlled Vocabularies • Continue working with CESM and other codes to expand their self-documenting capabilities through the use of encoded metadata with ESMF