320 likes | 471 Views
WP4 task 4.2. Massimo Argenti Tatiana Tarasova massimo.argenti@ gmail.com T.Tarasova@uva.nl. Outline. WP 4 task 4.2: approach 2 steps harmonisation Overview of the 2 steps harmonisation [by Massimo] Metadata harmonisation [by Massimo]
E N D
WP4 task 4.2 Massimo Argenti Tatiana Tarasova massimo.argenti@gmail.com T.Tarasova@uva.nl
Outline • WP 4 task 4.2: approach 2 steps harmonisation • Overview of the 2 steps harmonisation [by Massimo] • Metadata harmonisation [by Massimo] • Data harmonisation [by Tatiana]
WP4 task 4.2:data harmonisation • Data Integration, Harmonisation and Publication Facilities • Data harmonisation is a process of making data from heterogeneous data sources compatible, comparable and interoperable and, thus, useful for further data integration Approach: 2 steps harmonisation process: • Metadata harmonisation • Data harmonisation through a common ontology
Overview of the 2 steps of harmonisation • Definition a common harmonized metadata set to describe the different infrastructures data and definition of a set of transformation to map the already available metadata (if any) to the harmonized one • Definition of a first Ontology to enrich the description of the collected data in the central catalogue for data discovery purposes • Definition of a second Ontology to describe a common language to represent heterogeneous data to be used to discover specific data conditions
Metadata harmonisation • Definition of a core set of metadata fields • Final metadata set of concepts ready to describe all kinds of different infrastructures data • Automatic process to convert available metadata in the ENVRI harmonized one • Manual process to create the metadata of the data with not available one
Ontology Definition (1/2) • Definition of an ENVRI Ontology to enrich the knowledge to facilitate the discovery data process • Definition of a ENVRI specific domain concepts: Data-Provider, Platform, Instruments, others • Integration of already consolidated Earth Science Ontologies: GEMET, SBA, GCMD
Ontology Definition (2/2) • GEMET (GEneral Multilingual Environmental Thesaurus) provides a user friendly parameter discovery interface for the European Environment Information and Observation Network (EIONET). It makes use of SKOS, (Simple Knowledge Organisation System) and also the metadata registries standard, ISO 11179. • SBA (Social Benefit Areas) GEO Group and observation is constructing GEOSS on the basis of a 10-year implementation plan for the period 2005 to 2015. • GCMD (Global Change Master Directory) science keywords list is a comprehensive directory of information about Earth science data, including the oceans, atmosphere, hydrosphere, solid earth, biosphere and human dimensions of global change. (NASA)
Data harmonisation • Definition of common Ontology to be used to convert heterogenous data in a common language • Definition of a set of transformations to convert original data discovered in a final common format • Preparation of a set of predefined queries based upon Sparql to retrieve data with specific parameters conditions values cross linked between heterogeneous data
Data harmonisation: Outline • Analysis of the ENVRI data to find commonalities for further integration • Linked Data Approach: model data not documents • The RDF Data Cube vocabulary (ICOS examples)
Data harmonisation Seismology • How do we combine these heterogeneous data? • Can we still find commonalities? • Do we know in advance how we want to integrate data? Atmosphere ICOS Euro-Argo Oceanography Volcanology
Data harmonisation through common ontology Seismology Atmosphere ICOS mappings ICOS Euro-Argo mappings Euro-Argo Oceanography Volcanology
Data analysis: ICOS Dataset “CO2 concentration measured by Mace Head” Observation hasMeasuredParameter “CO2” hasUnitOfMeasure “ppm” measuredInObservatory “Mace Head” isProvidedBy “ICOS” hasObservedValue “400.474” measuredAtYear “2011” measuredAtMonth “1” measuredAtDay “1” measuredAtHour “3” CO2 concentration measured by the Mace Head observatory
Data analysis: ICOS Observations Dataset Metadata Attributes CO2 concentration measured by the Mace Head observatory
Data analysis: Euro-Argo Dataset Observations Metadata Attributes Ocean temperature measured by the Euro-Argo platform 4900679
Data analysis: ACTRIS Dataset Observations Metadata Attributes 4 dimensional distribution of the volcanic ash plume over Europe by ACTRIS
Common structural concepts Common components to represent structure of observational data Observation Dataset Metadata Attribute
Data Cube: core model The RDF Data Cube vocabulary [1] is a generic framework to represent multi-dimensional data RDF: “<Observation> has dataset <Dataset>” “<Dataset> has structure <Dataset Structure>” “<Dataset Structure> has metadata attribute <Metadata Attribute>” Observation has dataset has structure Dataset Dataset Structure has metadata attribute Metadata Attribute
Example: ICOS in Data Cube Dataset “CO2 concentration measured by Mace Head” Observation hasMeasuredParameter “CO2” hasUnitOfMeasure“ppm” measuredAt “Mace Head” providedBy “ICOS” hasObservedValue “400.474” measuredAtYear “2011” measuredAtMonth “1” measuredAtDay “1” measuredAtHour“3” CO2 concentration measured by the Mace Head observatory
Example:Structure of ICOS datasets Dataset Structure hasMeasuredParameter is a hasUnitOfMeasure ICOS-dataset-structure hasObservedValue has metadata attribute …
Example: ICOS observation Metadata attributes are properties in the RDF model These properties link a dataset with specific values of the attribute. hasMeasuredParameter CO2 concentration ICOS-observation has dataset PPM hasUnitOfMeasure ICOS-dataset has structure 400.474 hasObservedValue … ICOS-dataset-structure 09/09/2014
Example:ICOS observation in RDF triples <http://example.envri.org/icos/observation131526> a qb:Observation ; qb:dataSet <http://example.envri.org/icos/co2> ; <http://example.envri.org/hasMeasuredParameter> “CO2 concentration”; <http://example.envri.org/hasUnitOfMeasure> “PPM”; <http://example.envri.org/providedBy> “ICOS” ; <http://example.envri.org/hasObservedValue> ”400.474" ; <http://example.envri.org/measuredAtYear> "2011" ; <http://example.envri.org/measuredAtMonth> "1" ; <http://example.envri.org/measuredAtDay> "1" ; <http://example.envri.org/mesuredAtHour> "0" ; <http://example.envri.org/measuredAt> "mhd" .
Work in progress Idea: We can semantically align ENVRI data using common metadata attributes. Approach: Reuse and extend the GENESI-DEC vocabulary (Platform, Instrument, etc.). Methods: • Extend the vocabulary with the semantic relationship between concepts. Consider related work: Virtual Solar-Terrestrial Observatory (VSTO) ontology • Extend the vocabulary with a list of existing platforms, observatories, instruments, code lists, thesauri, etc. (questionnaires for data providers)
Conclusion • 2 steps approach for harmonisation: metadata harmonisation (data discovery) and data harmonisation (structural and semantic data alignment) • Metadata harmonisation based on the metadata ENVRI ontology (Protégé, SESAME) • Data harmonisation based on the data ENVRI ontology (Protégé, Virtuoso) • The RDF Data Cube vocabulary (generic, extendable) complies with the Linked Data approach to model data not documents! • Combine data with SPARQL queries and using common terms from the ENVRI ontology
SPARQL across data sets:data providers prefix envri: <http://example.envri.org/> SELECT DISTINCT ?providerName WHERE { ?dataset envri:providedBy ?provider . ?provider rdfs:label ?providerName . }
SPARQL across data sets: observations from March 2010 prefix qb: <http://purl.org/linked-data/cube#> prefix envri: <http://example.envri.org/> SELECT ?providerName ?paramName ?uofName ?year ?month ?value WHERE { ?observation a qb:Observation . ?observation qb:dataSet ?dataset . ?dataset envri:providedBy ?provider . ?provider rdfs:label ?providerName . ?observation envri:measuredAtYear ?year . ?observation envri:measuredAtMonth ?month . ?dataset envri:hasMeasuredParameter ?param . ?param rdfs:label ?paramName . ?dataset envri:hasUnitOfMeasure ?unitOfMeasure . ?unitOfMeasure rdfs:label ?uofName . ?observation envri:hasObservedValue ?value . FILTER (?year = "2010") FILTER (?month = "3" OR ?month = "03") }
References • The RDF Data Cube vocabulary http://www.w3.org/TR/vocab-data-cube/ • The Virtual Solar-Terrestrial Observatory (VSTO) ontology http://www.vsto.org/
Backup slides Backup slides
The Virtual Sollar-Terrestrial Observatory (VSTO) Ontology [2] has measured parameter DataArchive Parameter data archive for operated by observatory Instrument Observatory is observatory of has instrument operating mode DataProvider InstrumentOperatingMode
Standards to consider • OGC GeoSPARQL provides the foundatinal geospatial vocabulary for Linked Data involving location and defines SPARQL extensions to process geospatial data. • Basic Geo (WGS84 point(lat/long), no alt) Vocabulary http://www.w3.org/2003/01/geo/ • The OWL Time ontology (express facts about topological relations among instants and intervals, together with information about durations, and about time information.
Related work The European Environmental Agency (EEA) is an agency of the European Union http://www.epimorphics.com/web/wiki/bathing-water-quality-structure-published-linked-data