1 / 22

Dominic Lowe, BADC Tim Duffy, BGS

Dominic Lowe, BADC Tim Duffy, BGS. Managing Data For Integrated Science. What does “Integrated Science” mean?. “...to advance knowledge of planet Earth as a complex, interacting system.” (NERC 'Mission')‏

ziarre
Download Presentation

Dominic Lowe, BADC Tim Duffy, BGS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dominic Lowe, BADC Tim Duffy, BGS Managing Data For Integrated Science

  2. What does “Integrated Science” mean? • “...to advance knowledge of planet Earth as a complex, interacting system.” (NERC 'Mission')‏ • Our understanding of the “system” is constrained by the quality of data we have on the system (and by other things...) The ability to read and sharethe data we do have is critical to advancing knowledge.

  3. Data Management & Integration Probably not... different environments Measurement devices? Possibly... e.g. Sensor Observation Service Raw data encoding? Data Management Yes, e.g. CF Conventions, NetCDF No, specialist file formats and databases still important Final data format? Yes. Agree common models for data Interoperability. “World view”? Not scalable if data not integrated Yes, if common view on data exists Tools/Services?

  4. INSPIRE Data Specifications • 34 EU “world views”... for various “themes”. • Candidate data specifications may be submitted to INSPIRE for consideration. • BGS & BADC are INSPIRE “Legally Mandated Organisations” (LMOs). Transport Networks Land cover Natural Risk Zones Orthoimagary Coordinate Reference Systems Geographical Grid Systems Species Distribution Environmental Monitoring Facilities Atmospheric Conditions Land use Hydrography Elevation Habitats and Biotopes Oceanographic geographical features Geology Sea Regions Meteorological geographical features Soil Energy Resources

  5. Climate Science Modelling Language • CSML: A data model encoded in UML and XML. • …developed by BADC & RAL e-Science Centre. • ... for describing and constructing Climate Science datasets in terms of feature types • ... based on GIS standards (ISO, GML) http://csml.badc.rl.ac.uk

  6. Example CSML feature type: “ProfileFeature” ProfileFeature + location + time + domain (heights, pressure levels) + rangeset (measured values) + phenomena (salinity, temperature) Oceanography Atmospheric Science

  7. Diverse tools, shared data model. CSML Data Model CSML Data Model Time

  8. Mapping to common model simplifies integrated data services Dataset X Data X Service A Dataset Y Service B Dataset Z Service A Service C CSML Data Model

  9. Essential elements for data integration: Data management: Identifiable elements Controlled Vocabularies URIs Catalogue servers GML Dictionaries Domain models Metadata profiles Resolving Services Governance procedures Measured Phenomena Locations Time Coordinate Reference Systems Units of Measure Errors Statistical methods Tackle this problem NOW to provide for an integrated future.

  10. Data conditioning processes enable integration: Standard File Formats Metadata creation Identification Liaison with data providers & end users Documentation Error checking Quality Control Standardisation Archive management Rigorous data management NOW will provide for an integrated future.

  11. Recap • Integrated science needs integrated world views using agreed data models such as CSML, GeoSciML. • Data conditioning and the adoption of shared data models are pre-requisites for scalable technological services for better integrated science. • Data centres must play a key role in conditioning data to enable cross-institutional and cross-discipline interoperability. • Data managers must participate in community governance of standards. • Integrated science depends on good data management.

  12. Integrated Science and external users of NERC data (e.g. INSPIRE) require Interoperable data e.g. • Users increasingly expect to be able to obtain digital geoscience and other environmental data in a standard well defined and understood structural and semantic form from different providers. They want interoperability. • “interoperability” can mean different things to different people. One good practical definition from INSPIRE is ‘the possibility for data services to interact, without repetitive manual [human] intervention’.

  13. interoperability semantic schematic syntax systems What does interoperability mean? There can be considered four levels of interoperability: Data content Data structure Data language Data systems Concept standardisation GeoSciML OpenGIS community (OGC)

  14. How the International CGI built GeoSciML (2003-2008) The first step was to create a logical model based on internationally agreed concepts of geoscience (we call this a conceptual model for short) showing: • the objects in the system • what their properties (attributes) are • how they relate to each other. This has been built using Universal Modelling Language (UML). This is thus a Model Derived Architecture (MDA) – as required increasingly by law on the international scene. Model first then XML schema second. Very important. The UML model was then used to define the structure of the mark-up language for data transfer. The ‘profile’ of UML modelling used was especially designed so that an XML ‘application of GML’ could be produced from it. The mark-up language used is XML (W3C) and incorporates pre-existing standard implementations, in particular Geography Mark-up Language (GML 3.1.1) developed by OGC. GeoSciML is a GeoScience Mark-up Language based on GML encoded in XML.

  15. Using International standards to engender interoperability: http://www.geosciml.org

  16. The scope of GeoSciML • The scientific information (not cartography) normally shown on geological maps including: • Geologic Units • Earth materials (lithologies) • Geologic Structures This information is mainly interpretative • Boreholes • Samples and measurements and further observations

  17. What we can do with GeoSciML GML-based data can be used in OGC web services: Rendered into a queryable map … … formatted into a report or …. … read and used by any WFS/GML enabled application

  18. It's a walk through of the draft conceptual design. It's a walk through of the draft conceptual design.

  19. Testbed demonstrator

  20. What next for GeoSciML? • EXPRESSING NERC owned and managed geographically located data as UML modelled and designed GML as XML schemas enhanced with concept standardisation, moves us quite far along the road towards greater interoperability between scientific data of similar domain, and prepares the ground for intra-domain combination of data e.g. INSPIRE ANNEX II data being used with ANNEX I and III • HOWEVER it helps to EXPOSE the fact that we may not have understood as well as we thought our own data and did not understand the data quality and conditioning required to let ‘others’ ‘use’ ‘our’ data e.g. BGS has learnt in setting up GeoSciML services that we must map (in the data) some of our internal concepts to internationally agreed ones; BGS has learnt that some of our quality assured ISO 9001:2000 registered data may not be fit for purpose for letting ‘others’ in integrated science use it ‘in the real world’… • Data for integrated sharing must be data conditioned and made fit for purpose and in 2009 GeoSciML version 3 will be developed with a wider INSPIRE and OneGeology scope.

  21. Conclusion The drive for integrated science is pushing the development of data models such as CSML and GeoSciML which are further highlighting the need to put effort into data conditioning for interoperability on as many levels as possible.

More Related