300 likes | 404 Views
Introduction to Oceanographic Data Management. Catherine Maillard First Training Session Ostende, February 12-17, 2007. Catalogues. User’s Web browser. Analysis program. Data sets aggregation. Data management. Data discovery. Safeguarding. CDI - Data indexing in local
E N D
Introduction to Oceanographic Data Management Catherine Maillard First Training Session Ostende, February 12-17, 2007
Catalogues User’s Web browser Analysis program Data sets aggregation Data management Data discovery Safeguarding CDI - Data indexing in local archiving system ANALYSIS & MODELLING SYSTEMS OBSERVING SYSTEMS Data Compilation Data Formatting Product generation Quality control Checks
1. Data Compilation The data never go directly to the data centres – therefore it needs to: • Locate the data sets not yet archived • Request and get a copy of the missing data sets from the source laboratory/scientist – • Check that the data sets is properly documented
COMPILATION 1.1: Locate the data sets which are not yet archived • Search in cruise report (CSR) catalogue • Or in observation system (EDIOS) • Or in EDMED or EDMERP A data set should be identified either • + Maintain regular direct contacts
COMPILATION 1.2: get a copy of the missing data sets from the source laboratory/scientist • Request(s) a copy of the missing data sets identified as not archive at any format Emphasize the importance of: • long term archiving to follow up the environmental changes • Integration in long time series of data of the same type – availability of global/regional/thematic database depends on all contributions • Facilitate the use of these databases • Get and safeguard the electronic file • Sometimes necessity of digitalization (GODAR)
COMPILATION 1.3: The mandatory meta-data • Check that the data sets is properly documented with the mandatory fields described • a minimum of meta-data should be included in the data files eg. • Reference to cruise or observation system and source laboratory • Sensor type • Parameter names and units etc. • Complete the missing information by asking questions to the originator
2 - Data Reformatting • In general the original formats of the data files cannot be used in data management • Incomplete/not standardized meta-data • Incompatibility with QC and other processing input format • Need of a unique archiving format for safeguarding the data sets of the same type • Data management format, Archiving format and dissemination/exchange format(s) may be but not necessarily the same
2 - Different Data Formats used • Archiving format : can be one of the actual exchange format or local format designed according to rules to insure sustainability • Exchange/Disemination format(s): joint projects and interoperability require common exchange format(s) • Data Management/processing
2.1 : General rules for sustainability of an archiving format The archiving format should: • be independent from the computer (and libraries) – RDBS are not appropriate • insure that any isolated data includes enough meta-data to be processed (eg. Location and date) • be compatible and include at least the mandatory fields (meta-data) requested for the agreed exchange format(s) • Include additional textual or standardized “history” or “comment” fields to prevent any loss of information • Provide similar structure and meta-data for different data type such as vertical profiles and time series These rules are normally followed also for exchange formats
2.2 - SeaDataNet Data transport Formats • obligatory formats: • NetCDF (Binary) for gridded data and 3D observation data such as ADCP • (Modified) ODV spreadsheet for other data types (vertical profiles and time series) • optional format: • ASCII Medatlas as standard exchange format for the Mediterranean and Black Sea community. • BODC leads the task to modify the present ODV and NetCDF formats for SeaDataNet use (QC flags, parameters semantics etc..and conformity with the international standards) • Formatting exercises to asses the coherence and compatibility of exchange formats
2.3 – Processing Formats • For data management, (QC, cataloguing, selection, extraction, visualisation) the data can be • In the archiving format and the • In relational database system (RDBS) – the presently most used RDBS in the community are ORACLE and MySQL • Note: an interface is needed between the software input format and the local data management system
3 - Quality Checks • What they do • Detect missing mandatory information • Detect errors made during the transfer or reformatting • Detect remaining outliers • Detect duplicates • Attach a quality flag to each numerical value • What they don’t do • the preliminary data calibration and validation made by the expert scientists • Modify the data points • General rule • The tools for data QC are not unique (eg. ODV and other local systems), but the procedures are compatible. • Any QC of a data set should be reported to the originator to give feedback and ask questions • How they are performed Next presentation by Sissy
4 - Safeguarding • The QCed data sets should be safeguarded in a perennial system for further use • 2 copies • Following up of the backup when the system or the technology changes • It is recommended to use the common computer infrastructure of the institutes for making the backup regular and automatic • The original not standardized and not QCed data sets should be safeguarded also, for possible further checks by the data manager or the source scientists, but not to be disseminated
5 - Data Dissemination and service • National data sets according to the national rules • Aggregated data sets with other data sources • Export the data • in a unique exchange format • With the appropriate documentation on: • the format and codes • QC performed on the data • The source of the data and the condition of use (license)
5 - Data aggregation • Data Aggregation represents a service and a product • To answer data requests related to a geographical area or other selection criteria independently from the source • Interrogate the local data centre • Complete with other sources • Eliminate the duplicates
Other data sources • The other data centres of the consortium • Regional and project databases: • ICES: North-East Atlantic • Medatlas 2002, Mater1996-1999 but some data included in Medatlas, MFS/MOON for RT • The World Ocean Atlas – delayed mode data • The Coriolis/Argo Server – Real Time Data • The satellite data
The consortium data • The Common Data Index (CDI) shows what is presently available in the data centres. It will be continously updated during the project http://www.sea-search.net/cdi/ (also from the SeaDataNet website) • During the development phase (2006-2007) of the interoperable system, by the Technical Task Team, each data centre is interrogated separately to get access to the the data - Several Data centres provide on line tools for data search and access, including geographical selection and web services.
Regional Databases • ICES http://www.ices.dk/ocean/ ICES format • Medatlas 2002 www.ifremer.fr/medar + Cdrom +ftp site • Developed in the frame of the EU Medar project (a regional DAR) Data selection tools according to various criteria including geographical search available on the Cdrom Also available on line from several partner data centres Medatlas format
World Ocean Atlas 2005http://www.nodc.noaa.gov/OC5/WOD05/pr_wod05.html • Developed by US/NODC – WDC Washington – Ocean Climate Laboratory in the frame of IOC/GODAR project with the contribution of the other data centres • Data, mainly delayed mode data, are available through on line selection tool or on DVD (on request) • All the fields can be interrogated for data selection. The possibility to select countries by group ( to get all but the own country, or all but the SDN consortium for example) is commonly used.
Data Types in WOA 2005 • Type of observations • Ocean Station Data (OSD) [Bottle, low resolution CTD/XCTD, plankton data] • High Resolution CTD/XCTD (CTD) • Expendable (XBT) and Mechanical (MBT) Bathythermographs • Autonomous Pinniped Bathythermographs (APB) • Profiling Floats (PFL) • Drifting Buoys (DRB) • Moored Buoys (MRB) [TAO, PIRATA, others] • Undulating Oceanographic Recorder (UOR) [Towed CTD] • Glider data (GLD) • Surface-Only (SUR) [Bucket, Thermosalinograph] • Parameters • Pressure, Temperature, • salinity + 23 bio-geochemical parameters + biological taxons
WOA 2005 export format • US-NODC format • Codes and standards different from SeaDataNet • Tools available to process the data: • US/NODC tools in fortran, C, Java to read the data • SeaDataNet/Ifremer tool to transcribe from WOA to Medatlas by a converter (presently available in Unix only) • ODV can visualise the data directly in WOA format
Coriolis/ Argo Serverhttp://www.coriolis.eu.org/cdc/ • The Coriolis/Argo server is one of the two Argo Global Data Assembly Centres (GDAC) • synchronized on a daily basis with the US GODAE Data Centre (Monterey) • serving daily real time data (+gridded analyses) from the following national DACs including: Australian, Canadian, Chinese, French, Indian, Korean, Japanese, UK, and US, contributors from Chile, Costa-Rica, Germany, Morocco, Mexico, Norway, Netherlands, Russia, Spain and data from the GTS (sources difficult to establish) • On line selection tools allowing to visualize and download in-situ data
Data Types in Coriolis/Argo • Vertical profiles mainly from : • XBT, XCTD or XBT from research or opportunity vessels ; • Argo profiling floats ; • Anchored buoys or moorings ; • Drifting buoys. • Trajectory data mainly from : • Drifting buoys ; • Argo floats ; • Vessels equipped with a thermosalinograph (GOSUD server) • Many data but few parameters : P, T, S essentially • Unerdevelopment: integration in the SeaDataNet system
Export Formats from Coriolis/Argo • Argo Netcdf – widely used in operational oceanography, designed for TS profiles • ASCII – (quasi) Medatlas Important: for Medatlas format extraction, do the data selection data type by data type, to avoid to have all types grouped in the same file.
Duplicates problem for data dissemination and products preparation • Even if the data are checked for duplicates at the national levels, remaining problems may exist: • Data insufficiently documented and attributed to two different sources • PTS files and same station with other parameters • RT and DM profiles • Data declassified by the Navies with poor meta-data • Data sets from the GTS with decimated and poorly documented profiles
What tcan be done? • Selection country by country (however difficult for the RT) • Visualising ship tracks and trajectories and superimposing the position maps of cruises made in the same region in the same period. • In case of duplicate data sets, evaluate which is the best set of observations, the more complete and documented etc.. Can lead to a lot of manual work in the QC
Template for TA web page All the images in the directory « Template_images »
Education and Outreach pages • SDN-EDU.html
Conclusive remarks • SeaDataNet is developing basic tools for implementing the data management activities in conformity with internationally agreed protocols. • The NODC/DNA of the 40 TAP use either the common tools or the existing local systems, but they should be inter-comparable and compatible. • The present infrastructure is not yet stabilized in regards of standards and available software, but the main functionalities are available to insure the data circulation from the start of the project. • Any new information, result or software is made immediately available on the website. • Importance of developing a local page to connect by using the ENEA template