Common Data Index (CDI) Data Discovery & Access service. D.M.A. Schaap - SeaDataNet Technical Coordinator. Background. Marine and oceanographic data are collected by hundreds of organisations from government, research and industry across Europe
Background • Marine and oceanographic data are collected by hundreds of organisations from government, research and industry across Europe • Many organisations manage their data themselves, while others make use of National Oceanographic Data Centres (NODCs) and other Marine Data centres. • The CDI service has been initiated by the NODCs to provide a unified overview and access to all these distributed data resources
Common Data Index (CDI) • CDI service development started early 2000 as part of EU FP5 framework programme in Sea-Search project (2002 – 2005) • Further development in EU FP6 SeaDataNet (2006 – 2011) and EU FP7 SeaDataNet 2 project (2011 – 2015) • Driven by the NODCs of 35 countries around European seas • Uptake by many other EU projects, such as FP6 - Up-Grade Black Sea SCENE, FP6 - CASPINFO, FP7 - EuroFleets, FP7 – EuroFleets II, FP7 - JERICO, FP7 - Geo-Seas, FP7 – CitClops, FP7 – Micro B3
Common Data Index (CDI) • It provides users a highly detailed insight and unified access to the large volumes of marine and oceanographic data sets managed by distributed data centres • CDI describes an observation such as sample, core, timeseries, profile, trajectory, survey • Metadata format is marine profile of the ISO 19115 geographical content standard, while XML coding follows the ISO 19139 Schema and it is fully INSPIRE compliant
Common Data Index (CDI) • Most info fields are marked up using the SeaDataNet Controlled Vocabularies and other SeaDataNet European directories (EDMO – organisations; EDMERP – projects; CSR – cruises; EDMED – data set collections) • Set of tools and web services to be used by each data centre • MIKADO: generator of XML descriptions of SeaDataNet catalogues • NEMO: reformatting software to SeaDataNet formats • Download Manager: local software to control downloading • ODV: Ocean data view adapted to SeaDataNet needs • Web services for vocabularies and directories • Online services for submission, validation, publication, retrieval, and visualisation
SeaDataNet NODCs and Marine data centres are leading partners • The SeaDataNet NODC’s and Marine data centres are mostly divisions of major national marine research institutes and based in 35 countries, surrounding the European seas • They are experienced in managing data, which includes quality control, documenting, long term stewardship and publishing, for their own institute and for many other institutes in their country • NODC’s are members of IOC-IODE and ICES for sharing global practices • They are encouraging and supporting other institutes in their countries to become part of the CDI infrastructure, through building of national infrastructures and through European projects
Pan-European directory services EDMO EDMERP CSR CDI Overview of organisations in Europe with their involvement in marine projects, data sets, cruises and monitoring EDMED EDIOS
Pan-European directory services • EDMO : European Directory of Marine Organisations (>2900 entries) • EDMED : European Directory of Marine Environmental Datasets (>3900 entries from >700 data holding centres) • EDMERP : European Directory of Marine Environmental Research projects (>2800 entries) • CSR : Cruise Summary Reports (>44000 entries) • EDIOS : European Directory of Ocean Observing Systems (programmes and stations) (360 programmes; >16000 stations) • Controlled Vocabularies for platforms, parameters, instruments, sea regions, data formats, units, datums, …….. (> 160.000 terms and international governance)
Portal with all services, standards and tools for users AND data providers http://www.seadatanet.org
Components of the CDI service User interfaces CDI database Central User Register Shopping basket Request Status Manager Download Manager Download Manager Download Manager data data data
Request Status Manager • Tracking and tracing of all shopping requests • By users • By data providers • Analysis of transactions • Checking status of orders and downloading from data providers
Request Status Manager - reporting Overseeing all transactions and preparing reports
Monitoring of infrastructure Monitoring of the operational availability of the infrastructure for core services centrally based (website, catalogues querying) and for local services (downloading from connected data centres)
Monitoring of shopping process Monitoring of the well functioning of the CDI data requesting and downloading process between central portal and connected data centres by ROBOT user, alert mailings and RSM logging of all steps
Interoperability • Support of OGC WMS – WFS, CSW and OpenSearch • SeaDataNet CDIs can be retrieved from the IOC-IODE Ocean Data Portal (ODP) • SeaDataNet CDIs can be retrieved from the GEOSS portal
Cooperation with other EU projects and EU initiatives • For refining and enriching the SeaDataNet standards • For making the SeaDataNet approach and standards fit for purpose of many different user communities • For expanding the CDI infrastructure with more data centres and more volumes of multi-disciplinary data • For developing added-value services on top of the CDI infrastructure • For developing interoperability to other infrastructures
CDI service and EMODNet • CDI service engaged in EMODNet Chemistry, Bathymetry, Physics, Geology and Biology • EMODNet stimulates wider adoption of standards, more connected data centres and development of basin scale data products and services, fit for purpose of specific user communities such as MSFD implementation, industry users, .. • EMODNet is management and policy driven which provides an excellent opportunity for CDI infrastructure to become a sustained operational infrastructure
> 1,56 million CDI entries from 34 countries and 103 data centres and >500 originators for physics, chemistry, geology, geophysics, bathymetry and biology; years 1800 – 2014; 85% unrestricted or under SeaDataNet licence
Ingestion • EMODNet seeks ingestion to get on board data sets from researchers and industry beyond existing data flows (crowd sourcing) • Prevent to frustrate existing mechanisms and infrastructures • Make use of SeaDataNet core network with NODCs and Marine data centres with presence in 35 countries riparian to European seas • Set up a coordinated online submission system with national nodes operated by SeaDataNet centres
Ingestion • The SeaDataNet nodes will consider the incoming data and decide about: • Contacting data providers about possibly becoming connected as a distributed data centre as part of CDI • Routing the incoming data to the appropriate national data centre in the CDI network for intake, QA – QC, further documenting in dialogue with provider, and making arrangements for long term stewardship and wider availability through the CDI service • QA – QC should not be underestimated and is required to make the new data fit for use and combining with other data! • Budget needed for these uptake activities because data centres can not do this from their existing resources
DOI and Data Publishing • CDI is latest version of a data sets • DOI should be applied to give extra impulse to scientists to make their data sets available • DOI at level of collections • NODCs follow IODE Cookbook on DOI • DOI and Data Publishing also subject of discussion at the ODIP Workshops between Europe, USA and Australia to achieve a common approach