330 likes | 676 Views
Outline 1 – General overview of fusion databases 2 – Data storage/retrieval methods and datastructures 3 – SDAS at ISTTOK. Database access and data retrieval ( a users view ). R. Coelho Associação EURATOM/IST, Instituto de Plasmas e Fusão Nuclear. I - General overview of fusion databases.
E N D
Outline 1 – General overview of fusion databases 2 – Data storage/retrieval methods and datastructures 3 – SDAS at ISTTOK Database access and data retrieval (a users view) R. Coelho Associação EURATOM/IST, Instituto de Plasmas e Fusão Nuclear
I - General overview of fusion databases • Databases play a fundamental role in fusion plasma research • Essential for storage of seminal/standard benchmarking discharges. • Assist the construction/deduction of elementary scaling laws and design phase of fusion devices (what to expect on confinement, MHD, transport,…) • Assist the modeling effort by providing a validated set of input experimental data (cross sections, machine dependent data,…) and experimental plasma data on which to validate the codes. Databases offer a clear display of community achievements
Fusion databases : 3 notable examples • International Multi-Tokamak Profile Database (ITPA) • Atomic Data and Analysis Structure (ADAS) • Experimental Nuclear Reaction Data (EXFOR)
International Multi-Tokamak Profile Database (ITPA) • Objectives • To provide all the information required for transport codes to simulate discharges from a variety of tokamaks. • Provide data to be compared against the predicted outputs from the codes. • Provide data and the modelling results to be used as part of the ITER physics basis. • Coverage • Released publically in 1998. • Built from 201 shots from 21 devices. Recent data has been added to secondary but remains for “working group” only access.
International Multi-Tokamak Profile Database (ITPA) • Storage/accessing • MDS+ server, data stored as MDS+ trees. • Relational database with comments, 0D and 1/2D metadata assists the database queries. http://tokamak-profiledb.ukaea.org.uk/ C M Roach, M Walters, R V Budny, F Imbeaux, TW Fredian et al, Nuc. Fus., 48, 125001 (2008)
Atomic Data and Analysis Structure (ADAS) • Objectives • Provide interconnected set of computer codes and data collections for modelling the radiating properties of ions and atoms in plasmas. • Assist in the analysis and interpretation of spectral emission and support detailed plasma models (crucial in plasma edge). • Coverage • Plasmas ranging from the interstellar medium through the solar atmosphere and laboratory thermonuclear fusion devices to technological plasmas.
Atomic Data and Analysis Structure (ADAS) • Accessing • A key range of routines for accessing the database and delivering data to user codes is included. FORTRAN, C, C++, IDL and MATLAB are supported. http://open.adas.ac.uk/index.php Assisting fusion since JET was born…(1983)
Experimental Nuclear Reaction Data (EXFOR) • Objectives • Provide an extensive compilation of experimental nuclear reaction data. • Coverage • Neutron induced reactions have been compiled systematically since the discovery of the neutron. • Charged particle and photon reactions have been covered less extensively • Data from 17700 experiments, its' bibliographic information, as well as experimental information about the data. The status (e.g., the source of the data), and history (e.g., date of last update) of the data set is also included.
Experimental Nuclear Reaction Data (EXFOR) • Repository • Stored at International Network of Nuclear Reaction Data Centres (NRDC). http://www-nds.iaea.org/exfor/exfor.htm
II - Data storage/retrieval methods • MDS+ • HDF5 • Universal Access Layer method • Paradigm for data retrieval methodologies
MDSplus (MDS+) http://www.mdsplus.org What is it ? • Set of software tools for data acquisition and storage and a methodology for management of complex scientific data. What iT allows ? • All data from an experiment or simulation code to be stored into a single, self-descriptive, hierarchical structure. How difficult is it ? • Programming interface contains only a few basic commands, simplifyng access even into complicated data structures.
MDSplus (MDS+) SOME CONCEPTS • The Data Hierarchy - Trees, Nodes, and Models. A self-descriptive hierarchy called a TREE, consisting of large numbers of named NODES which make up the branches (structure) and leaves (data) of each tree. • MDSplus SHOTS are trees created from a special type of tree called a MODEL, a template which contains all of the structure and setup data for an experiment or code. • Node Characteristics - Self Description : metadata including the data type, array dimensions, data length, units, independent axes, the parents and children of the node, tag names, the date when the data was stored, the name of the user who wrote data, and so forth.
MDSplus (MDS+) TREE EXAMPLE • The node on the far right "Ip" is an example of a MEMBER, a type of node used to contain data • Child and member nodes as analogous to the directories and files on a typical operating system.
MDSplus (MDS+) DETAILS ON THE API • The basic calls as they would be ordered in an application are, in generic syntax: mdsconnect,'server_name' mdsopen,'tree_name',shot_number result = mdsvalue('expression') mdsput,'node_name','expression' mdsclose,[[Documentation_beginners_tree_name,shot] mdsdisconnect
MDSplus (MDS+) ACCESSING JET DATA (workaround since not a native MDS+ server storage) • MATLAB >> mdsconnect('mdsplus.jet.efda.org') >> [y,status]=mdsvalue('_sig=jet("ppf/magn/ipla",40573)') >> [x,status]=mdsvalue('dim_of(_sig)') >> mdsdisconnect • IDL IDL> mdsconnect,'mdsplus.jet.efda.org' IDL> y=mdsvalue('_sig=jet("ppf/magn/ipla",40573)') IDL> x=mdsvalue('dim_of(_sig)') IDL> plot,x,y IDL> mdsdisconnect
HDF5 http://www.hdfgroup.org/index.html • HDF5 is a self-describing file format and library for storing scientific data. • A versatile data model that can represent very complex data objects and a wide variety of metadata (different datatypes on the same tree) with direct access to parts of the file without parsing the entire file. • A completely portable file format with no limit on the number or size of data objects in the collection. • A software library that runs on a range of computational platforms, from laptops to massively parallel systems, and implements a high-level API with C, C++, Fortran 90, and Java interfaces.
Universal Access Layer (UAL) MOTIVATION • HDF5 and MDSplus represent successful tools for a common data format and organization, thus allowing effective data sharing among different applications. • But will these standards survive the lifespan of ITER ? A more generic approach is envisaged and been implemented on the ITM-TF. • Consistent Physical Objects (CPO) - a generic view in trees and sub-trees of the data organization, transparent to the actual method used for data storage. G. Manduchi et al, Fusion Engineering and Design 83, 462-466 (2008)
Universal Access Layer (UAL) DATA STRUCTURE
Universal Access Layer (UAL) DATA STRUCTURE MSE CPO
PARSING THE DATA STRUCTURE • CPO tree-like hierarchical structure is defined through language independent XML schemas. These can be easily parsed to each programming language.
Universal Access Layer (UAL) DATA FLOW (D.COSTER) • The multi-level UAL manages the CPO I/O between codes as a common data bus and the data retrieval (MDS+ or HDF5 stored data)
Universal Access Layer (UAL) CPO I/O • euitm_open(name,shot,run) • euitm_get(path, output_structure) • the location of the CPO is specified by the string argument “path” • output_structure is language dependent and will hold the output data. • • euitm_put(path, input_structure) • the location of the CPO is specified by the string argument “path” • input_structure is language dependent and will hold the input data. CPO is specified by the string argument “path”.
Universal Access Layer (UAL) ACCESSING EXPERIMENTAL DATA Cortesy of J.Signoret and F.Imbeaux
Metodologies for data retrieval WHAT IS A SIGNAL ? “any kind of data that describes a particular measurement during a discharge and contains some information about plasma properties”,e.g. 2/3D data time-series data, contour maps, images… OUTPUT PER SHOT ? Diagnostics at JET top 10 Gbytes/shot….much smaller than the expected values for ITER ! WHAT IS MEASURED ? Physical properties manifest as patterns with a direct parallel between the physical behaviour and the structural shapes that are generated (spikes in D emission during Edge localised modes (ELMs), Soft X-ray and ECE emission during sawtooth crash (ST).
Metodologies for data retrieval Traditional approach • Query founded on shot/signal • Manual inspection of structural shapes/features • Very tedious and long process
Metodologies for data retrieval Pattern recognition approach • Data with technical and scientific criteria guidance. • “Pattern oriented” compliant, just as people behave when they analyze data. • Relies on enclosed techniques for data retrieval : • Feature extraction • single entity (temporal segment inside a waveform or a set of pixels within an image) • compound entity (more than one segment/signal) • Classification system (supervised/unsupervised) • Similarity measure (metrics proximity measure) J.Vega et al, Fusion Eng. And Design 83, 382(2008)
III – Shared Data Access System (SDAS) • Why another Data Retrieval Software? • The problem • Scientists need to access data from different laboratories; • Each laboratory has its own way of retrieving data; • Scientists have to spend time and effort learning how the different data access schemes work, change their analysis code for each experiment and manage updated versions for each different program and library required; • Does not mean that every association must store and retrieve data in the same way. • The main data index is changing from shot number to time and events, where the pulse number is just one among the most relevant events against data is catalogued. 28/33
Shared Data Access System (SDAS) • Why another Data Retrieval Software? • The solution • Hide all complexity from end-users; • Scientists only have to learn once how to access data; • Users don't ask data for information directly to the association's database but to a software layer; • The software layer provides the same data access functions in all associations; • Data blocks are tagged against specific events which happen during the life cycle of a discharge 29/29
Without SDAS With SDAS
SDAS is based on Remote Procedure Calls (RPC); • The SDAS server is formed by an XML-RPC server • and by a connector to the storage mechanism; • Data is indexed by time and events; • SDAS server and libraries available on Python, • Java and C++; • Read and Write support (for post processed data) • Supported in several data analysis programs: • Matlab, IDL, Octave, Mathematica • Documentation in wiki: http://cdaq.cfn.ist.utl.pt:8085/ • Currently being used in ISTTOK/PT, Compass/CZ • and TJ-II/ES SDAS Technology 31/29
Data access • SDAS libraries are easily integrated in programs such as MatLab, Mathematica and IDL; • SDAS provides over 20 functions which allow to: • Search parameters and events; • Retrieve single and multiple data 32/33