180 likes | 189 Views
This paper discusses the architecture and functionality of a Distributed Information System, focusing on publishing tools that simplify data acquisition, retrieval, and publication. It explores the challenges of ensuring data validity and describes various types of constraints involved in the publishing process.
E N D
Publishing Tools for a Distributed Information System A.Z. Fazliev,N.A.Lavrentiev, A.I.Privezentsev V.E. Zuev Institute of Atmospheric Optics SB RAS, Academician Zuev Square 1, Tomsk 634021, Russia E-mail: faz@iao.ru HITRAN Conference, Cambridge, 16-18 June 2010
Introduction • e-Science. Service oriented architecture • e-Science. Three layers information systems • The Data, Information and Knowledge Lifecycle • Architecture of W@DIS • Model of Quantitative Molecular Spectroscopy • Data Validity • Formal constraints • Selection rules. Primary data sources • Publication constraints • Non-formal constraints • Current State of W@DIS HITRAN Conference, Cambridge, 16-18 June 2010
e-Science. Service oriented architecture De Roure D., Jennings N., Shadbolt N. A Future e-Science Infrastructure // Report commissioned for EPSRC/DTI Core e-Science Programme. 2001. 78 p. HITRAN Conference, Cambridge, 16-18 June 2010
e-Science. Three layers information systems The Data-Computation Layer “As soon as computers are interconnected and communicating we have a distributed system, and the issues in designing, building and deploying distributed computer systems have now been explored over many years. First it positions the Grid within the bigger picture of distributed computing, asking whether it is subsumed by current solutions. Then we look in more detail at the requirements and currently deployed technologies in order to identify issues for the next generation of the infrastructure. Since much of the grid computing development has addressed the data-computation layer, this section particularly draws upon the work of that community.” The Information Layer “This layer is focus firstly on the Web. The Web’s information handlingcapabilities are clearly an important component of the e-Science infrastructure, andthe web infrastructure is itself of interest as an example of a distributed system thathas achieved global deployment. The second aspect addressed issupport for collaboration, something which is key to e-Science. The information layer aspects build on the idea of a ‘collaboratory’, defined as a “centre without walls, in which the nation’s researchers can perform their research without regard to geographical location - interacting with colleagues, accessing instrumentation, sharing data and computational resource, and accessing information in digital libraries.” The Knowledge Layer “The aim of the knowledge layer is to act as an infrastructure to support themanagement and application of scientific knowledge to achieve particular types ofgoal and objective. In order to achieve this, it builds upon the services offered by thedata-computation and information layers. The first thing to reiterate at this layer is the problem of the sheer scale of contentweare dealing with. We recognise that the amount of data that the data grid is managingwill be huge. By the time that data is equipped with meaning and turned intoinformation we can expect order of magnitude reductions in the amount. Howevertheamount of information remaining will certainly be enough to present us with aproblem – a problem recognised as infosmog – the condition of having too muchinformation to be able to take effective action or apply it in an appropriate fashionto aspecific problem. Once information is delivered that is destined for a particularpurpose, we are in the realm of the knowledge grid that is fundamentally concernedwith abstracted and annotated content, with the management of scientific knowledge.” De Roure D., Jennings N., Shadbolt N. A Future e-Science Infrastructure // Report commissioned for EPSRC/DTI Core e-Science Programme. 2001. 78 p. HITRAN Conference, Cambridge, 16-18 June 2010
The Data, Information and Knowledge Lifecycle Acquire Maintain The challenge ofknowledge publishing or disseminating can be described as getting the right data, informationand knowledge, in the right form, to the right person or system, at the right time … . Publish Modelling Retrieve De Roure D., Jennings N., Shadbolt N. A Future e-Science Infrastructure // Report commissioned for EPSRC/DTI Core e-Science Programme. 2001. 78 p. HITRAN Conference, Cambridge, 16-18 June 2010
Getting the right data, ….. • What is the hierarchy of the problems ? • Lifecycle. Implement Distributed Information System that allows simplify for the investigator aquire, retrieve and publish data and information • Publish. Create Publishing Tools • Key question. Guarantee Data Validity • Constraints Types.1. Restrictions on physical entity values (for instance, selection rules). Verification of the restrictions is identical to verification of statement 2. Existence (Publication) Restrictions ( ) S – spectroscopy domain, X- physical entity characterized by quantum numbers, Y – published data set - existential quantifier - universal quantifier HITRAN Conference, Cambridge, 16-18 June 2010
Inference engine Ontology of spectroscopy tasks’ solutions properties Logical consistency check Description of non-calculable properties of molecular spectroscopy inverse and direct problems’ solutions Molecular spectroscopy tasks’ solutions properties Decomposition of problems’ solutions according to publications Computation of calculated properties of direct and inverse spectroscopy problems’ solutions Primary solutions of inverse spectroscopy tasks Composite solutions of spectroscopy tasks Formation of composite problems’ solutions Primary solutions of direct spectroscopy tasks System of direct and inverse spectroscopy problems’ solutions input Spectral functions calculation Publications DB Node Architecture of W@DIS Semantic Web approach Web-service for the formation of an ontology of molecular spectroscopy tasks’ solutions properties Protégé interface Knowledge layer W@DIS, CaD@DIS Web-service for the formation of a homogeneous set of inverse and direct tasks solutions properties in a distributed system Information layer HITRAN Conference, Cambridge, 16-18 June 2010 Interfaces Data - computation layer Web-service of publications data base synchronization Data Node Applications Interfaces Web-services
Isolated molecule energy levels (T7) Isolated molecule physical characteristics (T1) Einstein coefficients (T6) Isolated molecules spectral line parameters (T2) Quantum numbers assignment to spectral lines (T5) Spectral line profile parameters (T3) Interacting molecule spectral line parameters (ET) Spectral functions calculation (T4) Spectral functions measurement (E) Model of Quantitative Molecular Spectroscopy Inverse Problems Direct Problems Computations Measurements Two chains of problems are selected for domain
IUPAC Data group approach (information aspect) Elementary solution of spectroscopic problem Elementary source characteristicsmolecule – H2Othe list of physical quantities – energy levels E (cm-1), Quantum numbers (v1 v2 v3 J Ka Kc),…….publication - Schwenke D.W., New H2O Rovibrational Line Assignments. // Journal of Molecular Spectroscopy, 1998, v. 190, no. 2, p. 397-402data - ……………………………………………………………… HITRAN Conference, Cambridge, 16-18 June 2010
Data Validity Formal constraints Data type – Quantum Numbers – natural numbers, …Intensity, Halfwidth, Frequency, Energy Levels – positive real numbers, ….Variation interval – 0 <wavenumbers< 45000 cm-1, 10-16cm/mol<intensity<10-30cm/mol Selection rules -normal modes - ka+kc=JorJ+1, ….. Publication constraint Whether data are published or not Other constraints (transitivity and antisymmetry axioms) ………………………………………….. Non-formal constraintsExperts’ opinion XML OWL DL HITRAN Conference, Cambridge, 16-18 June 2010
Selection rules. Primary data sources HITRAN Conference, Cambridge, 16-18 June 2010 9(2) Total number of data sources(Number of correct data sources) Privesentsev A.I., Ontological knowledge base implementation and software for information resources description in molecular spectroscopy, Tomsk State University, PhD Dissertation, 2009, 238 Pages
Validity Publication constraints Informatics Restriction. RFC 2396: A (information) resource can be anything that has identity. Decomposition. Mathematical basis. Axiom of reflexivity: For each a, a=a. Physical constraint a ---- 1-st type of measurements --- > A1 a ---- 2-nd type of measurements -- > A2 Experimental accuracy Criteria of identity |A1 – A2| < The complete validated data set are published by IUPAC data group (J. Tennyson, P.F. Bernath, L.R. Brown, et al., IUPAC Critical Evaluation of the Rotational-Vibrational Spectra of Water Vapor. Part I. Energy Levels and Transition Wavenumbers for H217O and H218O,Journal of Quantitative Spectroscopy and Radiative Transfer, July 2009, V.110, no.9-10, P.573-596.) HITRAN Conference, Cambridge, 16-18 June 2010
Decomposition. Hitran-2008 (H218O) HITRAN Conference, Cambridge, 16-18 June 2010
Publication constraints 0. L.S. Rothman, R.R. Gamache, A. Goldman, L.R. Brown, R.A. Toth, H.M. Pickett, R.L. Poynter, J.-M. Flaud, C. Camy-Peyret, A. Barbe, N. Husson, C.P. Rinsland, and M.A.H. Smith, “The HITRAN database: 1986 Edition,” Appl.Opt. 26, 4058-4097 (1987) 26. H. Partridge and D.W. Schwenke, “The determination of an accurate isotope dependent potential energy surface for water from extensive ab initio calculations and experimental data,” J.Chem.Phys. 106, 4618-4639 (1997). 28. J.P. Chevillard, J.-Y. Mandin, J.-M. Flaud, and C. Camy-Peyret, “H218O: line positions and intensities between 9500 and 11 500 cm-1. The (041), (220), (121), (201), (102), and (003) interacting states,” Can.J.Phys. 65, 777-789 (1987). 30. R.A. Toth, “Linelist of water vapor parameters from 500 to 8000 cm-1,” see http://mark4sun.jpl.nasa.gov/data/spec/H2O. 34. Calculation from K.V. Jucks, private communication (2000). Composite data sourceUnpublished data source Published data and data in HITRAN are not the same HITRAN Conference, Cambridge, 16-18 June 2010
Non-formal constraints J. Tennyson, P.F. Bernath, L.R. Brown, et al., IUPAC Critical Evaluation of the Rotational-Vibrational Spectra of Water Vapor. Part I. Energy Levels and Transition Wavenumbers for H217O and H218O Journal of Quantitative Spectroscopy and Radiative Transfer, July 2009, V.110, no.9-10, P.573-596. J. Tennyson, P.F. Bernath, L.R. Brown, et al., IUPAC Critical Evaluation of the Rotational-Vibrational Spectra of Water Vapor. Part II. Energy Levels and Transition Wavenumbers for HDO, HD17O and HD18O Journal of Quantitative Spectroscopy and Radiative Transfer, 2010. HITRAN Conference, Cambridge, 16-18 June 2010
H2O H2S SO2 CO2 N2O NH3 CH4 C2H2 CO O2 OCS HNCO ~ 2000 articles H2O H2S CO2 CO CH4 ~ 1200 data sets H2O H2S SO2 O3 N2O OCS NH3 C2H2 CO HBrO CO2 CH4 In the end of August 2010 CO2 H2O H2S now + NH3 COCH4 In the end of 2010 Upload Systems e-Library(Primary Data) Digitized Data Current State of the W@DIS http://wadis.saga.iao.ru http://saga.molsp.phys.spbu.ruhttp://atmos.appl.sci-nnov.ru Data Base & Knowledge Base of DIS HITRAN Conference, Cambridge, 16-18 June 2010
Summary ► A node prototype of a Distributed Information System for acquire, retrieve, publish and maintain data, information and knowledge in quantitative molecular spectroscopy is developed and implemented ► A component of the publishing tools provides formal validation of data is implemented ► IS W@DIS – http://wadis.saga.iao.ru, http://atmos.appl.sci-nnov.ru http://saga.molsp.phys.spbu.ru HITRAN Conference, Cambridge, 16-18 June 2010
Acknowledgements We thank Prof. J.Tennyson for the assistance providing the creation of all data collections and Dr. S.Tashkun for his contribution in CO2 data collection. Fazliev A. thanks Prof. Tyuterev Vl.G. for fruitful discussion on the publications constraints. This work has received partial support from RFBR and 7-th Framework Programme HITRAN Conference, Cambridge, 16-18 June 2010