1 / 26

Discovering NERC's data - now and in the future

Discovering NERC's data - now and in the future. Linda Ault (BGS), Nic Bertrand (CEH), Nathan Cunningham (BAS), Steve Donegan (NEODC/BADC) , Mike Howe (BGS), Rachel Heaven (BGS), Roy Lowry (BODC), Norman Morrison (NEBC). Introduction. Introduction – a talk in 2 parts!

unity
Download Presentation

Discovering NERC's data - now and in the future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discovering NERC's data - now and in the future Linda Ault (BGS), Nic Bertrand (CEH), Nathan Cunningham (BAS), Steve Donegan (NEODC/BADC), Mike Howe (BGS), Rachel Heaven (BGS), Roy Lowry (BODC), Norman Morrison (NEBC)

  2. Introduction • Introduction – a talk in 2 parts! • Data Discovery at NERC Designated Data Centres (DDC’s) • The NERC Data Discovery Service • Controlled Vocabularies, Taxonomies, Thesauri, & Ontologies • Conclusion

  3. Making data discoverable • Essential for advancement of science • First step in promoting reuse of data • Reduces duplication of effort for collecting primary data • Better value for tax’s payers money • All NERC Designated Data Centre’s create and publish metadata to data portals to enable Discovery • Centres: BADC, NEODC, NGDC (BGS), BODC, EIDC (CEH), AEDC (BAS)

  4. Data Flow GEOSS INSPIRE GMES SEIS International GIGateway NERC Data Discovery Service GoGeo! National / Aggregated EIDC BADC NEODC AEDC NGDC BODC National /Thematic Science Produce Metadata Discover at multiple points

  5. Portals that access NERC data • NERC data centre’s provide data to many portals:

  6. Metadata production.. • Different centres & therefore different methods of generating, recording metadata.. • Oracle, Postgres, XML databases using information entered or transformed • NDG MOLES “Metadata Objects Links in Environmental Science” – used at BADC, NEODC & BODC • Implementations of GCMD, NDG vocabularies and ontologies • Publication via Z39.50, OAI-PMH as well as provision of Open Layers (OGC WebMapService, WebFeatureService)

  7. Metadata formats/content • DIF 9.4 still used – but limited content and vocabulary content – being phased out • DMAG Metadata Content Subgroup: preparing a profile of ISO19115 as the “NERC profile”. • ISO19115: defines information required for describing geographic information and services • Is applicable to: Cataloguing of datasets, clearinghouse activities… • NERC profile of ISO19115 aims to be consistent with INSPIRE yet cover all DC activities

  8. Standards Interoperability • DIF 9.4 still used – but limited content and vocabulary content – being phased out • DMAG Metadata Content Subgroup: preparing a profile of ISO19115 as the “NERC profile”. • ISO19115: defines information required for describing geographic information and services • Is applicable to: Cataloguing of datasets, clearinghouse activities… • NERC profile of ISO19115 aims to be consistent with INSPIRE yet cover all DC activities Semantic Interoperability SERONTO Vocabularies GeoSciML OGC O&M CSML EML Syntactic interoperability ISO19115 Technical Interoperability TC211 No Interoperability Levels of Interoperability Standards for Interoperability

  9. ISO19115/19139 DIF Metadata formats/content

  10. INSPIRE • INSPIRE is an EU framework to provide spatial harmonisation of geographic datasets within member states • INSPIRE will require all publicly owned “in-scope” metadata (i.e. NERC’s) to be in or compatible with INSPIRE metadata standard (a profile of ISO19115) • Compliancy by 2012 (spatial data services) • CSML/GeoSciML will provide a good basis/the format to address provision for some INSPIRE themes

  11. Harvesting the metadata.. • OAI-PMH (Open Archive Initiative: Protocol for Metadata Harvesting): • Providers and Harvesters • A harvester takes full XML metadata and returns a copy to the local environment • Any format – however, Dublin Core must be provided to be OAI-PMH compliant • Support for deleted records, detection of changed records, regular harvesting • Works via HTTP

  12. NDG MOLES • A data production tool is deployed at an observation station on behalf of an activity to produce a data entity. • The data entity object is used for discovery purposes (currently transformed to DIF) • MOLES v2/v3 soon… • MOLES will soon allow a new method for publishing metadata to portals.. • ATOM RSS feed will allow instant update of discovery records in the upgraded DDS..

  13. NERC Data Discovery Service • The NERC DDS uses an OAI-Harvester to import dataset xml metadata records from providers in DIF format • Makes them searchable via semantic or spatio-temporal searches • Based on NERC Datagrid (NDG) technologies • Underpinning technologies can be used to provide visualisation services (i.e. Environmental Data Portal)

  14. NERC DDS: the (near) future • NERC Medium Size Initiative (NDG3) intends to improve upon the existing DDS • Better geospatial searching • Results visualisation (similar to the Environmental Data Portal (EDP)) • Use of ISO19115 • Allow “instant” harvesting via ATOM RSS feeds (i.e. MOLES v2+) & use ISO as main format • Provide more intelligent searching (allow ranking by relevance, proximity etc etc) • Improved usage of vocabularies and ontologies to allow increasingly “intelligent” searching

  15. Conclusions for Part I • Data Centres serve diverse communities • Spatial data discovery underpinned by the emergence of spatial standards • Working together to achieve interoperability (Saves effort, reinventing the wheel, build on strength of discipline based approach) • Many entry points of discovery • Consistent and contributing with Legal frameworks, major international initiatives • Semantic interoperability critical • Updates for NERC DDS will provide greater functionality for users and providers • Licensing / Access / Use Constraints

  16. Controlled Vocabularies, Thesauri and Ontologies Part II

  17. Six degrees of separation To John Morrison The Isle of Harris (I think his brother’s name is Donald) • 1967 - Psychologist Stanley Milgram Harvard University • Asked individuals to send a package to a certain person in Boston • Described only by Name, including some general features and the fact that they lived in Boston… • 64 of the 300 packages made it to the designated recipients!

  18. What environmental data are available from samples taken from freshwater found in East Africa? Linking environmental data - I

  19. Freshwater found in East Africa Find Concept 1 ‘Freshwater’ Pond Lake Puddle Stream Groundwater Tapwater Concept 2 ‘found in’ Found in Located at Located in Placed in From Concept 3 ‘East Africa’ Kenya Tanzania Lake Albert Entebbe Ethiopia Nairobi Rwenzori Mountains Linking environmental data - I

  20. What environmental data are available from samples taken from deep sea thermal vents in the Pacific Ocean? Linking environmental data - II • Lets suppose this query returns very little hits • Another advantage of anchoring annotation terms to an ontology, means that you can automatically ‘relax’ the search space. • Enabling retrieval of related datasets from other marine habitats, such as coral reef atoll or oceanic trench.

  21. Concept 1 deep sea thermal vent coral reef atoll oceanic trench mountain air stream … Concept 2 Located in Found in Located at Located in Placed in Located in Concept 3 marine habitat terrestrial habitat aerial habitat arboreal habitat … Linking environmental data - II

  22. Common Reference Frameworks • NDG Vocabulary Server • provides access to lists of standardised terms that cover a broad spectrum of disciplines of relevance to the oceanographic and wider community • http://www.bodc.ac.uk/products/web_services/vocab/ • Marine Metadata Interoperability (MMI) • Portal site for promoting the exchange, integration and use of marine data through enhanced data publishing, discovery, documentation and accessibility • http://marinemetadata.org/ • GCMD • NASA’s Global Change Master Directory: Directory based access to more than 25,000 descriptions of earth science data sets and services covering all aspects of earth and environmental sciences. • http://gcmd.nasa.gov/

  23. Common Reference Frameworks continued • The Environment Ontology (EnvO) • An open-source, community-based Environment Ontology (EnvO); and Gazetteer (Gaz). Aims to support the semantically consistent description of environmental information associated with biological data of any organism or biological sample. • http://www.environmentontology.org/ • Socio-Ecological Research and Observation oNTology (SERONTO) • SERONTO is an ontology developed within ALTER-Net, a Long Term Biodiversity, Ecosystem, and Awareness Research Network funded by the European Union. • http://www.tdwg.org/proceedings/article/view/364 • GeoSciML • GeoSciML is a standards-based interchange format that provides a framework for application-neutral encoding of geoscience thematic data and related spatial data and is based on an agreed conceptual data model. • http://www.geosciml.org/

  24. Conclusions • Part II - CV’s, Thesauri & Ontologies • The use of common reference frameworks can help us describe the meaning and structure of scientific language and concepts, such that data can be more accurately described and linked together. • There are non-orthogonal semantics projects in active development in NERC. • There are possible benefits for NERC through, • the identification and propagation of best practice. • increasing the sharing of semantic resources and technology. • looking to the international community for examples of best practice and emerging standards.

  25. Appendix slides follow… • (The following slides are included as they may be useful for the discussion session)

  26. Best practice within the international community • OWL • The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. • OBO • Also a format for authoring ontologies. Developed for use in the biological / biomedical domain. • OBO Foundry • An effort with the goal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain • SKOS • Simple Knowledge Organisation Systems (SKOS) is a family of formal languages designed for representation of thesauri and controlled vocabularies • ‘SKOS Extensions’ are intended to provide ways to declare relationships between concepts with more specific semantics than the simple "broader-narrower", such as class-instance or partitive relationships.

More Related