260 likes | 444 Views
Discovering NERC's data - now and in the future. Linda Ault (BGS), Nic Bertrand (CEH), Nathan Cunningham (BAS), Steve Donegan (NEODC/BADC) , Mike Howe (BGS), Rachel Heaven (BGS), Roy Lowry (BODC), Norman Morrison (NEBC). Introduction. Introduction – a talk in 2 parts!
E N D
Discovering NERC's data - now and in the future Linda Ault (BGS), Nic Bertrand (CEH), Nathan Cunningham (BAS), Steve Donegan (NEODC/BADC), Mike Howe (BGS), Rachel Heaven (BGS), Roy Lowry (BODC), Norman Morrison (NEBC)
Introduction • Introduction – a talk in 2 parts! • Data Discovery at NERC Designated Data Centres (DDC’s) • The NERC Data Discovery Service • Controlled Vocabularies, Taxonomies, Thesauri, & Ontologies • Conclusion
Making data discoverable • Essential for advancement of science • First step in promoting reuse of data • Reduces duplication of effort for collecting primary data • Better value for tax’s payers money • All NERC Designated Data Centre’s create and publish metadata to data portals to enable Discovery • Centres: BADC, NEODC, NGDC (BGS), BODC, EIDC (CEH), AEDC (BAS)
Data Flow GEOSS INSPIRE GMES SEIS International GIGateway NERC Data Discovery Service GoGeo! National / Aggregated EIDC BADC NEODC AEDC NGDC BODC National /Thematic Science Produce Metadata Discover at multiple points
Portals that access NERC data • NERC data centre’s provide data to many portals:
Metadata production.. • Different centres & therefore different methods of generating, recording metadata.. • Oracle, Postgres, XML databases using information entered or transformed • NDG MOLES “Metadata Objects Links in Environmental Science” – used at BADC, NEODC & BODC • Implementations of GCMD, NDG vocabularies and ontologies • Publication via Z39.50, OAI-PMH as well as provision of Open Layers (OGC WebMapService, WebFeatureService)
Metadata formats/content • DIF 9.4 still used – but limited content and vocabulary content – being phased out • DMAG Metadata Content Subgroup: preparing a profile of ISO19115 as the “NERC profile”. • ISO19115: defines information required for describing geographic information and services • Is applicable to: Cataloguing of datasets, clearinghouse activities… • NERC profile of ISO19115 aims to be consistent with INSPIRE yet cover all DC activities
Standards Interoperability • DIF 9.4 still used – but limited content and vocabulary content – being phased out • DMAG Metadata Content Subgroup: preparing a profile of ISO19115 as the “NERC profile”. • ISO19115: defines information required for describing geographic information and services • Is applicable to: Cataloguing of datasets, clearinghouse activities… • NERC profile of ISO19115 aims to be consistent with INSPIRE yet cover all DC activities Semantic Interoperability SERONTO Vocabularies GeoSciML OGC O&M CSML EML Syntactic interoperability ISO19115 Technical Interoperability TC211 No Interoperability Levels of Interoperability Standards for Interoperability
ISO19115/19139 DIF Metadata formats/content
INSPIRE • INSPIRE is an EU framework to provide spatial harmonisation of geographic datasets within member states • INSPIRE will require all publicly owned “in-scope” metadata (i.e. NERC’s) to be in or compatible with INSPIRE metadata standard (a profile of ISO19115) • Compliancy by 2012 (spatial data services) • CSML/GeoSciML will provide a good basis/the format to address provision for some INSPIRE themes
Harvesting the metadata.. • OAI-PMH (Open Archive Initiative: Protocol for Metadata Harvesting): • Providers and Harvesters • A harvester takes full XML metadata and returns a copy to the local environment • Any format – however, Dublin Core must be provided to be OAI-PMH compliant • Support for deleted records, detection of changed records, regular harvesting • Works via HTTP
NDG MOLES • A data production tool is deployed at an observation station on behalf of an activity to produce a data entity. • The data entity object is used for discovery purposes (currently transformed to DIF) • MOLES v2/v3 soon… • MOLES will soon allow a new method for publishing metadata to portals.. • ATOM RSS feed will allow instant update of discovery records in the upgraded DDS..
NERC Data Discovery Service • The NERC DDS uses an OAI-Harvester to import dataset xml metadata records from providers in DIF format • Makes them searchable via semantic or spatio-temporal searches • Based on NERC Datagrid (NDG) technologies • Underpinning technologies can be used to provide visualisation services (i.e. Environmental Data Portal)
NERC DDS: the (near) future • NERC Medium Size Initiative (NDG3) intends to improve upon the existing DDS • Better geospatial searching • Results visualisation (similar to the Environmental Data Portal (EDP)) • Use of ISO19115 • Allow “instant” harvesting via ATOM RSS feeds (i.e. MOLES v2+) & use ISO as main format • Provide more intelligent searching (allow ranking by relevance, proximity etc etc) • Improved usage of vocabularies and ontologies to allow increasingly “intelligent” searching
Conclusions for Part I • Data Centres serve diverse communities • Spatial data discovery underpinned by the emergence of spatial standards • Working together to achieve interoperability (Saves effort, reinventing the wheel, build on strength of discipline based approach) • Many entry points of discovery • Consistent and contributing with Legal frameworks, major international initiatives • Semantic interoperability critical • Updates for NERC DDS will provide greater functionality for users and providers • Licensing / Access / Use Constraints
Six degrees of separation To John Morrison The Isle of Harris (I think his brother’s name is Donald) • 1967 - Psychologist Stanley Milgram Harvard University • Asked individuals to send a package to a certain person in Boston • Described only by Name, including some general features and the fact that they lived in Boston… • 64 of the 300 packages made it to the designated recipients!
What environmental data are available from samples taken from freshwater found in East Africa? Linking environmental data - I
Freshwater found in East Africa Find Concept 1 ‘Freshwater’ Pond Lake Puddle Stream Groundwater Tapwater Concept 2 ‘found in’ Found in Located at Located in Placed in From Concept 3 ‘East Africa’ Kenya Tanzania Lake Albert Entebbe Ethiopia Nairobi Rwenzori Mountains Linking environmental data - I
What environmental data are available from samples taken from deep sea thermal vents in the Pacific Ocean? Linking environmental data - II • Lets suppose this query returns very little hits • Another advantage of anchoring annotation terms to an ontology, means that you can automatically ‘relax’ the search space. • Enabling retrieval of related datasets from other marine habitats, such as coral reef atoll or oceanic trench.
Concept 1 deep sea thermal vent coral reef atoll oceanic trench mountain air stream … Concept 2 Located in Found in Located at Located in Placed in Located in Concept 3 marine habitat terrestrial habitat aerial habitat arboreal habitat … Linking environmental data - II
Common Reference Frameworks • NDG Vocabulary Server • provides access to lists of standardised terms that cover a broad spectrum of disciplines of relevance to the oceanographic and wider community • http://www.bodc.ac.uk/products/web_services/vocab/ • Marine Metadata Interoperability (MMI) • Portal site for promoting the exchange, integration and use of marine data through enhanced data publishing, discovery, documentation and accessibility • http://marinemetadata.org/ • GCMD • NASA’s Global Change Master Directory: Directory based access to more than 25,000 descriptions of earth science data sets and services covering all aspects of earth and environmental sciences. • http://gcmd.nasa.gov/
Common Reference Frameworks continued • The Environment Ontology (EnvO) • An open-source, community-based Environment Ontology (EnvO); and Gazetteer (Gaz). Aims to support the semantically consistent description of environmental information associated with biological data of any organism or biological sample. • http://www.environmentontology.org/ • Socio-Ecological Research and Observation oNTology (SERONTO) • SERONTO is an ontology developed within ALTER-Net, a Long Term Biodiversity, Ecosystem, and Awareness Research Network funded by the European Union. • http://www.tdwg.org/proceedings/article/view/364 • GeoSciML • GeoSciML is a standards-based interchange format that provides a framework for application-neutral encoding of geoscience thematic data and related spatial data and is based on an agreed conceptual data model. • http://www.geosciml.org/
Conclusions • Part II - CV’s, Thesauri & Ontologies • The use of common reference frameworks can help us describe the meaning and structure of scientific language and concepts, such that data can be more accurately described and linked together. • There are non-orthogonal semantics projects in active development in NERC. • There are possible benefits for NERC through, • the identification and propagation of best practice. • increasing the sharing of semantic resources and technology. • looking to the international community for examples of best practice and emerging standards.
Appendix slides follow… • (The following slides are included as they may be useful for the discussion session)
Best practice within the international community • OWL • The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. • OBO • Also a format for authoring ontologies. Developed for use in the biological / biomedical domain. • OBO Foundry • An effort with the goal of creating a suite of orthogonal interoperable reference ontologies in the biomedical domain • SKOS • Simple Knowledge Organisation Systems (SKOS) is a family of formal languages designed for representation of thesauri and controlled vocabularies • ‘SKOS Extensions’ are intended to provide ways to declare relationships between concepts with more specific semantics than the simple "broader-narrower", such as class-instance or partitive relationships.