Consistent Physical Objects: A Data Structure Concept for Integrated Modelling

09/06/2011 Consistent Physical Objects A data structure concept for Integrated ModellingPresented by: D. CosterTF Leader : G. Falchetto Deputies: R. Coelho, D. CosterEFDA CSU Contact Person: D. Kalupin

Outline • Motivations for Consistent Physical Objects (CPOs) • Standardized IO for experimental and simulation data • Modular description of plasma physics and tokamak subsystems • Define the granularity of data transfer for designing modular physics workflows: workflows display the physics processes and their interaction • CPOs in Workflows • The content and structure of CPOs • CPO Technology and Access library • Experience / Lessons learned 2

The main properties and motivation for CPOs (1) • Standardized I/O of physics components • Components solving the same type of physics problem are interchangeable with no change of the workflow structure • Straightforward integration of new components  modelling suite is extendable • Benchmarking physics codes is much easier with standardized I/O definitions • Tokamak generic  process multiple devices with a single framework • Model Validation and Operation Support • Data model encompasses both abstract physics concepts and tokamak subsystems (diagnostics, actuators) • Data model supports synthetic diagnostics and actuators  straightforward usage of experimental data and comparison to experiment (Plasma Reconstruction, Model Validation, Model Improvement) • Realistic Integrated Tokamak Modelling (coupling to PCS) 3

The main properties and motivation for CPOs (2) • Data model adapted to describe the complexity of physics interactions • Grouping all relevant information to a physics concept or tokamak subsystem into organized and modular data structures (equilibrium CPO ~ 200 signals)  provenance, consistency, user-friendly • Data model is modular : multiple independent objects • Data model helps in tracking data provenance in complex workflows • The structure of the data model is adapted to the • Decomposition of plasma modelling into elementary physics problems • Logics of the tokamak subsystems (many modular systems)  Workflows using CPOs transparently represent the logics of physics interaction between elementary physics problems / tokamak subsystems 4

The main properties and motivation for CPOs (3) • Maintenance • Language-agnostic definition of the data model • Automated generation of tools in various programming languages and documentation from the data model • Data model is easily extendable 5

CPOs in workflows • Data model helps in tracking data provenance in complex workflows • Workflows using CPOs transparently represent the logic of physics interaction between elementary physics problems / tokamak subsystems 6

CPOs and Workflows • Physics components are turned into physics actors • Use exclusively CPOs for I/O • MHD solver uses equilibrium CPO as input, provides MHD CPO as output • Only exceptions : Time and similar workflow parameters (loop indices, convergence tests). Individual signals can be transferred for specific non-IM applications e.g. visualisation, feedback control algorithms • The workflow is a suite of physics components which exchange CPOs • Links between actors represent both workflow and dataflow • The physics of the workflow and the dataflow are naturally displayed NB : the ITM-TF has chosen KEPLER to implement its workflow concepts, but the CPO concept is independent of this choice 7

ITM-TF workflows(KEPLER + ITM-TF technology on top) • Equilibrium identification constrained by measurements (EFIT-like actor) • MHD analysis workflow (refined equilibrium + linear MHD)

Modular architecture Physics code Solves an elementary physics problem Without assuming anything about the origin / destination of its input / output • The Workflow engine (Kepler), the Data Access library, and the physics codes are separated  Any of these components can be changed without affecting the others • The physicist creates the desired workflow by assembling Actors in the Workflow Engine. This simultaneously designs the physics dataflows as CPOs Workflow Engine (WE) Executes the workflow and manages the dataflows (passes CPO references) The workflow engine does not know anything about the CPO content : CPOs known only as references (name, occurrence, time) Universal Access Layer (Data access) Manages the CPO data exchanges, following the references passed by the WE 9

Inside a physics actor Workspace Temporary database entry (runwork) Instance of the whole datastructure Contains the state of all CPOs at all time slices File or memory Framework (workflow engine) Calls the wrapper, specifying the present time of the simulation Physics code Receives CPOin, CPOout, Physics calculations Updates CPOout Wrapper Calls UAL to GET the CPOin and CPOout at the requested time slices UAL Nested layers : the physics module remains « pure » with almost zero ITM technology inside Updates data management nodes Calls UAL to PUT the CPOout UAL 10

Minimal changes to codes are required Different approaches when converting codes to use CPOs (discouraged) use the data from the input CPOs to write out the input files required by the user code call the standalone code copy the information from the output files of the code into the output CPOs (most cases) copy the needed information into internal data structures call the main subroutine of the code copy the output from the internal data structures to the output CPOs (for new codes, or if code is completely refactored) use the CPOs as the main data structures in the code

The structure and content of CPOs 12

CPO : a generic object for maximal flexibility • CPOs are generic containers which can be used in a number of contexts  full workflow flexibility • CPOs are modular data units providing a complete description of • Abstract physical concepts (equilibrium, wave progagation, turbulence, …) • Tokamak subsystems (diagnostics, heating systems, PF coils, …) • A CPO describing a tokamak sub-system has a unique structure to describe either experimental or simulated data • Synthetic diagnostics • Straightforward comparison to experimental data • Flexibility : unique workflow for using either experimental or synthetic input 13

ITM-TF datastructure 4.09a (CPOs) covers already large areas of tokamak physics and technology • Abstract physical concepts • Core physics: core profiles, source terms, transport coefficients, impurities • In addition: equilibrium, linear MHD, neoclassical, sawteeth, wave propagation, fast particle distributions, turbulence, Scrape-Off Layer • Tokamak subsystems (in a Tokamak generic way) • Actuators: PF systems, TF coils, RF antennas (ICRH, LH, ECRH), NBI • Diagnostics: Magnetics, Interferometry, Polarimetry, ECE, Thomson Scattering, Charge Exchange Spectroscopy, Langmuir probes, Neutron diagnostics • Miscellanea • Standardized Dataset for atomic physics • Generic finite element mesh description • Waveforms for Plasma Control 14

Additional uses of CPOs: EDGE An electron temperature solution in the Scrape-Off Layer computed by SOLPS and visualized in the ASDEX Upgrade device geometry. • Generalized treatment developed for complex geometries • SOLPS results • Wall geometry eV The raw geometry data was provided by MPI-IPP Garching (T. Lunt), conversion to the general grid description was done at Aalto University Helsinki (T. Kuoskela), and the combined plot was created with VisIt at MPI-IPP Garching (H.-J. Klingshirn), using the VisIt-UAL connector (part of the grid service library).

Additional uses of CPOs: AMNS AMNS provider driver program AMNS CPOs AMNS user library

CPO : a generic object for maximal flexibility • CPOs contain data and meta-data (description of the data, its producer, …)  consistent and self-documented data block • Generic definitions of signals  tokamak independent (same definitions, same workflow whatever the device) • Coded in a language-agnostic way : XML • Structure : often highly hierarchical for clarity(large number of individual signals) 17

CPO structure • CPOs have a structure with many signals below • Substructures are frequent for clarity • Complete description of a physics concept or a tokamak subsystem  consistency • Each CPO has its own time array (if time-dependent) • Each CPO has a bookkeeping sub-structure (datainfo) • Code-specific parameters are recorded in codeparam 18

CPOs are asynchronous, as the tokamak subsystems in a real experiment • CPOs are asynchronous • Mimics the situation of a real experiment, each tokamak subsystem / diagnostic has its own acquisition strategy • The UAL contains interpolation methods, allows extracting the information contained in CPOs for a given time slice  synchronization of the input to a physics module • The workflow commands which time slice to extract 19

Individual signal description • Name • Definition • Units • Dimensionality • Time-dependent or not • All this information is gathered in the unique source of the datastructure (XML) 20

Experimental Data in the ITM-TF data structure • Consistent Physical Objects (CPOs) gather the machine description and the time-dependent data related to a given tokamak subsystem • consistency guaranteed between a system description and its related time-dependent signals during GET / PUT operations • The system is open to changing subsystem descriptions (Design & Upgrade studies) Bookkeeping / self-description Diagnostic Description Time-dependent measurements (MSE angles g) Time array 21

CPO Technology and Access Library (UAL) 22

CPO technology • Unique source defining all CPO properties : XML schemas • Independent of the programming languages • All CPO-related tools are generated dynamically from this unique source • HTML documentation • Storage (MDS+ model tree or HDF5 template file) • Multi-language CPO communication library (in 5 different languages !) • Experimental Import tools, machine description templates • Integrated Simulation Editor (visualisation and edition of CPO data) • The data structure can be expanded without effort : new physics can be added in the definition and all tools and documentation are adapted dynamically • Mandatory as the data structure becomes quite comprehensive ! 23

Universal Access Layer • The Universal Access Layer is the library used to exchange data units (CPOs) between physics modules in multiple languages. • It is generated dynamically from the XML data structure of the CPOs (XSLT)  automatically consistent with the data structure • C++, Fortran, Java, Matlab, Python versions are running and documented : • GET, PUT, management of time slices (interpolation, resampling) of all ITM CPOs • Back-end storage : MDS+ and HDF5 • Direct memory access • Remote data access (managed by MDS+) 24

UAL structure • Structured in modular layers. One could change one layer without modifying the others • High level : • Knows about CPO structure • Language specific • Dynamically generated • Low level : • Deals with single elements : GET/PUT scalars, vectors, arrays, … • Can use multiple storage methods • Transport : managed by MDS+ • Storage : MDS+, memory, HDF5 F90 Java C++ Matlab Python C MDSip, TDI func MDSplus (file or memory) HDF5 25

Feedback from experience • CPOs are really helpful for • Solving the n2 problem • Useful in forcing one to think about the inputs and outputs of a code • The design of modular and flexible workflows with a large number of components • Providing data consistency and tracking data provenance • Code-code benchmarking (as part of Verification) • Comparing codes to experiments (Validation) • Providing a wake-up call for FORTRAN-77 programmers • Main issues • Getting used to a new way of doing things • Wanting to change your data-structures rapidly while wishing everyone else would stop changing theirs • Impact on code portability • Need to have the infrastructure widely available! • This is true for nearly all systems – soluble problem! 26

Feedback from experience • Lessons learned • The ITM-TF data structure is still in rapid expansion (new physics, subsystems) • Platform-wide version control has been organised for reproducibility of old simulations • Backward compatibility (as long as no suppression / change of names is done in a CPO) • Modular data structure helps for extensions and backward compatibility • For the moment, ITM-TF is missing automated tools for forward translation (to a more recent version) 27

Additional slides 28

CPO occurrences • In some workflows, there is the need of multiple occurrences of the same CPO • Multiple source terms « coresource CPO » • Multiple equilibria with different resolutions (EFIT-like  fixed boundary low resolution  fixed boundary high resolution) • Multiple core profiles (one profile set predicted, another one with fitted experimental profiles) • Do not confuse multiple time slices and multiple occurrences • Time slices : correspond to the same object • Occurrences : independent objects (each one can have its own time base) • All occurrences of a CPO have the same structure : the structure is generic and relevant for any usage in the workflow. • The physicists decides the usage of the various occurrences by editing the workflow 29

Data entries • A data entry is an instance of the ITM data structure • Contains an instance of all CPOs (and all occurrences) • A data entry can describe either simulation results or experimental data – or a mixture of the two (e.g. « CHAIN2-like » fitted profiles with calculated sources and EFIT-like equilibrium • Catalogues (relational DB) help finding: • The data entries and a summary of their content • The CPOs they contain 30

User’s view of experimental data • Machine description and data mapping are originally provided as XML files • Machine data and time-dependent data are made available to users as Entries of the ITM database, i.e. CPOs that can be GET using the UAL. Machine description <XML/> Data mapping <XML/> Complete experimental CPOs Shot xxxxxx ITM DB Machine description Shot 0 ITM DB Exp2ITM (data import routine) UAL UAL UAL KEPLER actors Tokamak database MDS+ 31

Machine description • Example : diagnostic geometry (MSE) • Highlight of the XML machine description file: • A template is provided, only the parts in red are filled by the data provider <msediag type="CPO" documentation="MSE Diagnostic; Time-dependent CPO"> <setup_mse type="structure" documentation="diagnostic setup information"> <rzgamma type="structure" documentation="RZ of intersection between beam and line of sight [m]; Vector (nchords)"> <r type="vecflt_type" documentation="Major radius [m]" path="setup_mse/rzgamma/r" dim="35">0.709000, 0.718000, 0.729000, 0.743000, 0.759000, 0.790000, 0.809000, 0.830000, 0.851000, 0.873000, 0.911000, 0.934000, 0.957000, 0.980000, 1.00300, 1.04300, 1.06600, 1.08900, 1.11200, 1.13400, 1.17300, 1.19500, 1.21700, 1.23800, 1.26000, 1.29600, 1.31700, 1.33800, 1.35800, 1.37900, 1.41300, 1.43300, 1.45200, 1.47100, 1.49100</r> <z type="vecflt_type" documentation="Altitude [m]" path="setup_mse/rzgamma/z" dim="35">0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000 </z> </rzgamma> </setup_mse> </msediag> 32

Time-dependent data • The exp2ITM tool has been develop to import time-dependent data into ITM format • the exp2ITM code is generic, machine-independent • it uses an XML description of the mapping of local data to the ITM format. This XML file is machine-dependent • Mapping example (a template is provided, only the parts in red are filled by the data provider) <magdiag> <ip> <value path="magdiag/ip/value"> <name>SIPMES</name> <download> <download>mds+</download> <fixed_value/> </download> <interpolation>3</interpolation> <dimension>1</dimension> <time_dim>1</time_dim> </value> </ip> </magdiag> 33

Code-specific parameters • The codeparam subtree contains information on the code specific parameters • To be filled by the physics module for traceability • Codename, codeversion (obvious) • Parameters : list of code-specific parameters (use XML format and parser) • Output_diag ; list of code-specific diagnostic/output, in the same XML format • Output_flag : 0 if run successful, <0 values mean the result should not be used • For the moment, there is no management of the code parameters by Kepler : the physics module should read them directly from a file 34

The advantages of structured data • The advantages of structured data • All data passed in one block • Integrated Modelling is really about transferring a large number of signals between elementary physics problem solvers  extremely difficult to manage an IM dataflow with individual signals ! • Avoids mistakes due to multiplexing • Example : the equilibrium CPO gathers over 200 signals ! • Easier to maintain : additions to the data structure are fully transparent to physics modules • Object-oriented approach : data is self-described 35

CPOs are asynchronous, as the tokamak subsystems in a real experiment • CPOs are asynchronous • Mimics the situation of a real experiment, each tokamak subsystem diagnostic may has its own acquisition strategy • Inside a CPO, all signals have the same time base (avoid errors) • The UAL contains interpolation methods, allows extracting a single CPO time slice  synchronization of the input to a physics module • The workflow commands which time slice to extract • Gathering time-dependent and time-independent data in a CPO • Main motivation is consistency • Physics modules operate often on a single time slice (or a few of them)  need to have all information in one place • Time-independent data are not duplicated in the UAL back-end  no loss of storage / performance 36

Consistent Physical Objects: A Data Structure Concept for Integrated Modelling

Consistent Physical Objects: A Data Structure Concept for Integrated Modelling

Presentation Transcript

15-09-2011, Rotterdam

2011-09-12 MR9001

09 March 2011

Class Agenda – 09/06/2011

Date: 2011-09-19

2011/09/16

Spark 09/22/2011

09-02-2011

2011-09-13

OEWG 09-06-2011