1 / 33

NERC Metadata Gateway: Connecting Data and Tools for Environmental Research

This article provides an introduction to the NERC Data Centres and the NCAS, and highlights the key components of the NERC DataGrid Project. It discusses the importance of data discovery, access control, and the use of ISO standards in structuring metadata. The article also outlines the NERC Metadata Gateway and its role in facilitating data search and utilization.

ecklund
Download Presentation

NERC Metadata Gateway: Connecting Data and Tools for Environmental Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The NERC Metadata Gateway: a product of the NERC DataGrid Bryan Lawrence (on behalf of a big team) + + ]= +[ + + BADC, BODC, CCLRC, PML and SOC

  2. Introduction to NERC, the NERC Data Centres, and NCAS The NERC DataGrid Project Key Components: Data Tools, Data Discovery, {Access Control} NDG Information Environment Key Standards Structures: the ISO Family From CSML, {MOLES}, DIF to ISO19139 (NumSim) Distributed Content Search Why we did it this way Our Discovery Architecture NDG Discovery Now … and The Future – The “New NERC Metadata Gateway” ISO19139 Best Practice Summary Outline

  3. NERC: The Natural Environment Research Council The major player in UK environmental research Is both a funding agency, and a conglomeration of “centres”: internal “research” institutes, The British Oceanographic Data Centre (BODC) is part of one of the internal institutes. And external “collaborative” centres, which include: The Plymouth Marine Laboratory The National Oceanographic Centre, Southampton The National Centre for Atmospheric Science, NCAS, mostly embedded in Universities, but part of which is the British Atmospheric Centre (BADC) which is embedded in the CCLRC: Council for the Central Laboratories of the Research Councils Is about to be replaced by a new entity, which might be called the “Large Facilities Research Council” NERC has seven discipline based designated data centres (including the BODC and BADC), and requires as much integration of data access as possible. From discovery to utilisation, from genomics to ecology, from oceanography to atmospheric science, from antarctic science to British geology … Some Introductions

  4. NCAR Complexity + Volume + Remote Access = Grid Challenge British Atmospheric Data Centre http://ndg.nerc.ac.uk British Oceanographic Data Centre

  5. Lots of organisations Varying membership, and trust internally and between each other is not consistent. Lots of priorities Not all organisations are “about” data Different internal storage structures Data stored in variety of databases and filesystems. Some things well documented, but not automated Some things automated, but information content is sparse … Integrating data access non-trivial And none of that includes the important relationships with customers and collaborators! If it’s not obvious

  6. Discovery Tools Discovery Portal Metadata Search Direct Links to Data and Services Data Tools Slice and Dice Visualisation Manipulation Access Control Systems are resource limited Data may access may be restricted by license Metadata Structures to support all the above Key Components

  7. Or two: ISO TC211 Standards, e.g ISO 19101: Geographic information – Reference model ISO 19103: Geographic information – Conceptual schema language ISO 19107: Geographic information – Spatial schema ISO 19108: Geographic information – Temporal schema ISO 19109: Geographic information – Rules for application schema ISO 19111: Geographic information – Spatial referencing by coordinates ISO 19115: Geographic information – Metadata Open Geospatial Consortium Specs Geographic Markup Language, a toolkit for building data descriptions WMS, WCS, WFS, WPS: the Web (Map, Coverage, Feature, and Processing) services. Standards Landscape

  8. ISO 19101: Geographic information – Reference model …in a defined logical structure… …delivered through services… …and described by metadata. A geospatial dataset… …consists of features and related objects… Standards

  9. Data Description Standards • Geographic ‘features’ • “abstraction of real world phenomena” [ISO 19101] • Type or instance • Encapsulate important semantics in universe of discourse • “Something you can name” • Application schema • Defines semantic content and logical structure • ISO standards provide toolkit: • spatial/temporal referencing • geometry (1-, 2-, 3-D) • topology • dictionaries (phenomena, units, etc.) • GML – canonical encoding [from ISO 19109 “Geographic information – Rules for Application Schema”]

  10. Fully Featured GML Application Schema, with extensions for External binary data (Grib, netCDF etc) Irregular Grids, “Proper” vertical coordinate systems (both activities now on OGC and ISO standards tracks) V1.0 included seven feature types and provided only “data” modelling. V1.0 CSML tooling includes a scanner (creates CSML from netCDF files), and a parser (instantiates python objects which can be manipulated scientifically (based on the XML CSML documents). CSML: Climate Science Modelling Language

  11. MarineXML Testbed For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘MarineXML registry’ Phenomena in the XSD must have an associated portrayal Data from different parts of the marine community conforming to a variety of schema (XSD) The FTs can then be translated to equivalent FTs for display in the ECDIS system XSD XML Biological Species S52 Portrayal Library XSD XML Chl-a from Satellite XML Parser MarineGML(NDG) Feature Types XSLT XML XSLT XSLT SENC SeeMyDENC XSD MeasuredHydrodynamics XML XSLT XML XSLT XSLT ECDIS acts as an example client for the data. XSD Data Dictionary XML ModelledHydrodynamics The result of the translation is an encoding that contains the marine data in weakly typed (i.e. generic) Features Features in the source XSD must be present in the data dictionary. XSD Feature described using S-57v3.1Application Schema can be imported and are equivalent to the same features in CSML’ XML S-57v3 GML Slide adapted from Kieran Millard (AUKEGGS, 2005)

  12. All this requires agreement on standards The Concept of re-using Features Here structured XML is converted to plain ascii text in the form required for a numerical model HTML warning service pages are generated ‘on the fly’ Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts. XML can also be converted to SVG to display data graphically Slide adapted from Kieran Millard (AUKEGGS, 2005)

  13. conceptual model New Dataset Conforms to 101010 UGAS produces <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML V1.0 (Python, Complete) GML app schema GML dataset Application instance parser CSML Round Tripping - 1 Managing semantics

  14. V1.0 V2 in development CF Dataset scanner 101010 CF produces <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML V1.0 V2 in development GML app schema GML dataset Application instance parser CSML Round Tripping - 2 Managing data - 1

  15. ISO 19123 coverage class ‘Affordance’ modelled with UML <<type>> CSML2: Structure “Affords” Behaviour Moving beyond GML, but staying in the ISO Frame!

  16. CSML2: Related to new OGC Observations and Measurements Spec An Observation is an Event whose result is an estimate of the value of some Property of the Feature-of-interest, obtained using a specified Procedure

  17. CF Dataset CF Dataset 101010 101010 Define Dataset DECISION PROCESSES <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML Add Information GML dataset Managing Data 2 scanner XSLT PUBLISH ISO19115

  18. What is a dataset? The Most Important Decision Granularity too coarse: can’t find what you want – not enough information exposed. Granularity too fine: can’t find what you want – buried in unordered results.

  19. Options: Harvest or Crawl Distribute Query to known targets versus harvest from known targets and do local query Timeliness versus Responsiveness Decision: NDG Discovery based on Open Archives Initiative Protocol for Metadata Harvesting Additional Partners include NCAR, MPI-WDCC, TPAC, UK-MDIP Distributed Query

  20. Discovery Metadata Usage XML: Metadata store: can support a limited variety of different xml schema provided WS-interface understands them (need unique xquery for each method, schema pair)

  21. Currently Supporting NASA Global Change Master Directory: Directory Interchange Format (DIF) Experimenting with: Vanilla ISO19139 Dublin Core UK Gemini V1 format Will support following ISO profiles for harvest: (eventually) UK Gemini profile WMO profile IOC profile (whenever) US FGDC profile ALL SIMULTANEOUSLY: XML Database plus appropriate xqueries Metadata Formats

  22. Simulation in the context of ISO19139: NumSim NDG Products: NumSim

  23. NumSim Example NumSim Example

  24. Firefox Search Plugin

  25. International Discovery - Climate

  26. NDG “New Interface”

  27. Scrolling Down Within Record

  28. New Interfaces Simple Advanced • Issues: • Times (forecast, paleo etc) • BBOX (near poles and dateline) • Semantic Vocabulary matching (exploiting a new NDG web-service providing thesaurus content, and ontology mapping) (No CSS as yet)

  29. Metadata extensions and profiles ISO

  30. Background: Designed to exploit as much as possible of the xml-schema machinery Not designed for Humans! Advice: Use in conjunction with a clear concept of why it’s being used: Decide on dataset granularity, and use other metadata schema to describe how to use content (“A” metadata; e.g. an application schema of GML). Devise a profile with utility then: restrict, restrict, restrict. Document. Register. ISO19139

  31. On Restriction • ISO19139 is also about INTEROPERABILITY! • Don’t follow the ISO19139 advice and produce a new schema! • Ensure that your profile instances are valid vanilla ISO19139 • Restrict content out-of-band, e.g. schematron, etc. • Agree on how you’re going to deploy ISO19139

  32. On Extension • ISO19139 is also about INTEROPERABILITY! • Do follow the ISO19139 advice and produce a new schema! • Do what you need for your community, but: • Design so that code expecting ISO19139 instances can parse yours! • Make it easy for third party code to ignore your content!

  33. Summary • NDG dealing with heterogeneous environment • Successful deployment of OAI with discovery metadata • (There are some issues differentiating between model simulations and ordering response sets) • Directly linking to and exploiting GML application schema • Web Service backends make deployment easier. • Communities need to be very careful how they deploy ISO19139

More Related