1 / 65

BADC, BODC, CCLRC, PML and SOC

Distributed Data, Distributed Governance, Distributed Vocabularies, with a dash of CLADDIER: The NERC DataGrid. Bryan Lawrence (on behalf of a big team, and note also a substantial piece of work with specific authorship included herein). +. +. ]=. +[. +. +. BADC, BODC, CCLRC, PML and SOC.

javier
Download Presentation

BADC, BODC, CCLRC, PML and SOC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Data, Distributed Governance, Distributed Vocabularies, with a dash of CLADDIER: The NERC DataGrid Bryan Lawrence (on behalf of a big team, and note also a substantial piece of work with specific authorship included herein) + + ]= +[ + + BADC, BODC, CCLRC, PML and SOC

  2. Introduction to the BADC Motivation Standards Feature Types Taxonomy Overall Architecture NDG Products Discovery Portal Data Extractor MOLES (NumSim relationship with NMM) CSML CSML Description Prototyping in MarineXML Round-Tripping Vocabulary Issues IN NDG (Hughes, Kondapalli, Lowry) NDG Timeline Outline

  3. The BADC role is to assist UK researchers to locate, access and interpret atmospheric data and to ensure the long-term integrity of atmospheric data produced by NERC projects. Facilitation and Curation/Preservation! BADC Role

  4. A BADC dataset is an aggregation of data files, documents and metadata sharing common administrative policies. These policies could be file validation, access control or retention schemes. Datasets vary from TBs in millions of files to a few MBs in a single file. There are presently over 100 datasets. BADC Data Holdings

  5. BADC User examples • Atmospheric chemistry models. • Pollution chemistry measurement campaigns.

  6. User examples • Bird feeding habits.

  7. User examples • Wind power research. • Radio communication modelling. • A & E influenza cases.

  8. User examples • Castle mortar decay. • Discomfort indices.

  9. User Diversity These stats quite old now, but the distributions haven’t changed.

  10. Typically, two-thirds of this data will never see the light of the day: why? March 2006, 2.5 PB Climate in 20010 – A graphic Illustration Figures from Gary Strand, NCAR, ESG website No one can remember what it was, or, if they can remember that, where it is!

  11. http://www.realclimate.org/index.php?p=121 What McIntyre got right: Data as Evidence http://www.uoguelph.ca/~rmckitri/research/trcback.html

  12. Data Retention Policies University of Cambridge Research Division: Data generated in the course of research should be kept securely in paper or electronic format, as appropriate. Back-up records should always be kept for data stored on a computer. The [AMRC] considers a minimum of ten years to be an appropriate period. However, research based on clinical samples or relating to public health may require longer storage to allow for long-term follow-up to occur. [AMRC: Association of Medical Research Charities] University of Oxford Research Services Office: A successful laboratory notebook allows for ready verification of quality and integrity of research data and enables another investigator to reproduce the procedure which has been documented and get the same result. …. A successful laboratory notebook allows for ready verification of quality and integrity of research data and enables another investigator to reproduce the procedure which has been documented and get the same result. Natural Environment Research Council: … Scientists will frequently process the data they have collected selectively, or with specific application packages, in order to prepare material for publication in the scientific literature. But the full value of the data collected may only be realised if the entire dataset is subjected to generic processing (eg to ensure calibration and adequate quality control) and is sufficiently documented to allow others to re-use it at a later date. The original collector may be the only person in a position to undertake such work, and so to unlock the full potential of the data. Those holding data collected under NERC funding will be expected to cooperate in validating and publishing them in their entirety - when this can be justified in terms of their scientific value - rather than merely creaming off a subset for immediate publication in the literature. …

  13. NCAR Complexity + Volume + Remote Access = Grid Challenge British Atmospheric Data Centre http://ndg.nerc.ac.uk British Oceanographic Data Centre

  14. No one would change their data storage systems! Need to support a wide range of “metadata-maturity”! No NDG-wide user management system possible. It is illegal to share user information without each and every user agreeing … implies no way of having one virtual organisation with common user management! With a large enough group it is impossible to agree on common roles that could be associated with access control. … but we want single-sign on … and trust relationships between data providers … NDG Assumptions

  15. Want interdisciplinary semantic access to information, not abstract data getData(potential temperature from ERA-40 dataset in North Atlantic from 1990 to 2000) not: getData(“era40.nc”, ‘PTMP’, 20:50, 300:340, 190:200) or even worse: for j=1990:2000 getData(“era40_”+j+“.nc”, ‘PTMP’, 20:50, 300:340) Lossy is OK! Care less about completeness of representation than semantic unification Integration – semantics

  16. ISO 19101: Geographic information – Reference model …in a defined logical structure… …delivered through services… …and described by metadata. A geospatial dataset… …consists of features and related objects… Standards

  17. Standards • Geographic ‘features’ • “abstraction of real world phenomena” [ISO 19101] • Type or instance • Encapsulate important semantics in universe of discourse • “Something you can name” • Application schema • Defines semantic content and logical structure • ISO standards provide toolkit: • spatial/temporal referencing • geometry (1-, 2-, 3-D) • topology • dictionaries (phenomena, units, etc.) • GML – canonical encoding [from ISO 19109 “Geographic information – Rules for Application Schema”]

  18. CSML NCML+CF MOLES THREDDS CLADDIER DIF -> ISO19115 Architecture: NDG Metadata Taxonomy … not one schema, not one solution!

  19. Vocab Services Users NDG GUI Interface(s) Data Providers NDG Core Services Architecture: Deployment

  20. Vocab Services Users NDG GUI Interface(s) NDG Core Services Architecture: Deployment

  21. Vocab Services Users NDG GUI Interface(s) Architecture: Deployment

  22. Vocab Services Users Architecture: Deployment

  23. Current Status

  24. Core linking concept is the deployment MOLES: implementation of a Data Production Tool at an Observation Station on behalf of an Activity that produces a Data Entity Activity DataProductionTool ObservationStation Links the metadata records into a structure that can be turned into a navigable structure Deployment Each of the main metadata objects has security data attached to it. This means that this can be applied to queries on the metadata Data Entity

  25. Simulators as data production tools: NumSim NDG Products: NumSim

  26. NumSim Example NumSim Example

  27. International Discovery - Climate

  28. NDG “Pseudo-Demo” EXPLOITING DISCOVERY WEB SERVICE (running interface on my laptop last night)

  29. Scrolling Down More Browse

  30. MOLES Navigation Actually, this is where we plan to use NumSim and Numerical Model Metadata

  31. MOLES to Secure Dx

  32. Offering up trusted host list … NDG Authentication

  33. Data Extractor

  34. Aims: provide semantic integration mechanism for NDG data explore new standards-based interoperability framework emphasise content, not container Design principles: offload semantics onto parameter type (‘phenomenon’, observable, measurand) e.g. wind-profiler, balloon temperature sounding offload semantics onto CRS e.g. scanning radar, sounding radar ‘sensible plotting’ as discriminant ‘in-principle’ unsupervised portrayal explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit) NDG-A: Climate Science Modelling Language

  35. CSML feature types defined on basis of geometric and topologic structure Climate Science Modelling Language

  36. CSML feature types examples... ProfileSeriesFeature ProfileFeature GridFeature Climate Science Modelling Language

  37. Numerical array descriptors provides ‘wrapper’ architecture for legacy data files ‘Connected’ to data model numerical content through ‘xlink:href’ Three subtypes: InlineArray ArrayGenerator FileExtract (NASAAmes, NetCDF, GRIB) Composite design pattern for aggregation Climate Science Modelling Language

  38. Inline array Array generator Climate Science Modelling Language <NDGInlineArray> <arraySize>5 2</arraySize> <uom>udunits.xml#degreeC</uom> <numericType>float</numericType> <regExpTransform>s/10/9/ge</regExpTransform> <numericTransform>+5</numericTransform> <values>1 2 3 4 5 6 7 8 9 10</values> </NDGInlineArray> <NDGArrayGenerator> <arraySize>10001</arraySize> <uom>udunits.xml#minute</uom> <numericType>float</numericType> <expression>0:5:50000</expression> </NDGArrayGenerator>

  39. File extract Climate Science Modelling Language <NDGNASAAmesExtract> <arraySize>526</arraySize> <numericType>double</numericType> <fileName>/data/BADC/macehead/mh960606.cf1</fileName> <variableName>CFC-12</variableName> </NDGNASAAmesExtract> <NDGNetCDFExtract gml:id="feat04azimuth"> <arraySize>10000</arraySize> <fileName>radar_data.nc</fileName> <variableName>az</variableName> </NDGNetCDFExtract> <NDGGRIBExtract> <arraySize>320 160</arraySize> <numericType>double</numericType> <fileName>/e40/ggas1992010100rsn.grb</fileName> <parameterCode>203</parameterCode> <recordNumber>5</ recordNumber> <fileOffset>289412</fileOffset> </NDGGRIBExtract>

  40. MarineXML Testbed For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘MarineXML registry’ Phenomena in the XSD must have an associated portrayal Data from different parts of the marine community conforming to a variety of schema (XSD) The FTs can then be translated to equivalent FTs for display in the ECDIS system XSD XML Biological Species S52 Portrayal Library XSD XML Chl-a from Satellite XML Parser MarineGML(NDG) Feature Types XSLT XML XSLT XSLT SENC SeeMyDENC XSD MeasuredHydrodynamics XML XSLT XML XSLT XSLT ECDIS acts as an example client for the data. XSD Data Dictionary XML ModelledHydrodynamics The result of the translation is an encoding that contains the marine data in weakly typed (i.e. generic) Features Features in the source XSD must be present in the data dictionary. XSD Feature described using S-57v3.1Application Schema can be imported and are equivalent to the same features in CSML’ XML S-57v3 GML Slide adapted from Kieran Millard (AUKEGGS, 2005)

  41. MarineXML Testbed Biological sampling station with attributes for the species sampled at each Grid of Chl-a from the MERIS instrument on ENVISAT Predicted and measured wave climate timeseries (height, direction and period) Vectors of currents from instruments Slide adapted from Kieran Millard (AUKEGGS, 2005)

  42. All this requires agreement on standards The Concept of re-using Features Here structured XML is converted to plain ascii text in the form required for a numerical model HTML warning service pages are generated ‘on the fly’ Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts. XML can also be converted to SVG to display data graphically Slide adapted from Kieran Millard (AUKEGGS, 2005)

  43. Other User Example: Norwegian Met Office CSML Present

  44. conceptual model New Dataset Conforms to 101010 UGAS produces <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML V1.0 will be in NDG Alpha GML app schema GML dataset Application instance parser CSML Round Tripping - 1 Managing semantics

  45. V1.0 in NDG Alpha CF Dataset scanner 101010 CF produces <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML V1.0 in NDG Alpha GML app schema GML dataset Application instance parser CSML Round Tripping - 2 Managing data - 1

  46. CF Dataset CF Dataset 101010 101010 Define Dataset DECISION PROCESSES <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML Add Information GML dataset Managing Data 2 scanner XSLT PUBLISH ISO19115

  47. CSML (and friends): software stack

  48. Future of CSML Some new features: (Swath, Ragged ProfileSeries) But more importantly, factor out the storage and introduce the concept of “processing affordance”. (see http://home.badc.rl.ac.uk/lawrence/papers/ExeterCommunique06.pdf

  49. Architecture: Deployment

  50. Vocabulary Management for NERC DataGrid Michael Hughes, V.Siva Kondapalli and Roy Lowry

More Related