1 / 54

Sissy Iona HNODC, Greece

3 rd Training Workshop, 16-19 June, Oostende-IOC Offices General description of data management procedures – Description of the different steps " From data collection to the SeaDataNet mgmt system" . Sissy Iona HNODC, Greece. Topics. PART A : Presentation of general data management rules

nayef
Download Presentation

Sissy Iona HNODC, Greece

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3rd Training Workshop, 16-19 June, Oostende-IOC OfficesGeneral description of data management procedures – Description of the different steps"From data collection to the SeaDataNet mgmt system"  Sissy Iona HNODC, Greece

  2. Topics • PART A : Presentation of general data management rules • SeaDataNet Data Policy and Data Licence • Quality Controls • Rules for metadata sumbission to prevent duplication • PART B : Identification of main stages and available tools • Reformatting from observation format to common data format • Quality Controls • Metadata Concepts and Management (online, offline)

  3. Topics • PART A : Presentation of general data management rules • SeaDataNet Data Policy and Data Licence • Quality Controls • Rules for metadata sumbission to prevent duplication • PART B : Identification of main stages and available tools • Reformatting from observation format to common data format • Quality Controls • Metadata Concepts and Management (online, offline)

  4. I. SDN Data Policy History • Drafted by Project Office, 02/2007 • Reviewed by the Steering Committee • Validated by the Coordination Group • sdn_po07_Data_policy.doc, 04/2007 • http://www.seadatanet.org/media/seadatanet/files/publications/seadatanet_data_policy

  5. I. SeaDataNet Data Policy It is derived from the INSPIRE directive for spatial information taking into account the national rules and the SeaDataNet users needs. • Objectives • to serve the scientific community, public organizations, environmental agencies • to facilitate the data flow through the Transnational Activities by stating clearly the conditions for data submission, access and use

  6. I. Links and Framework SeaDataNet Data Policy is fully compatible with the EU Directives, International Policies, Laws and Data Principles: • Directive 2003/4/EC of the European Parliament and of the Council of 28 January 2003 on public access to environmental information and repealing Council Directive 90/313/EEC (http://ec.europa.eu/environment/aarhus/index.htm). • INSPIRE Directive for spatial information in the Community (http://inspire.jrc.it/home.html) • IOC Data Policy (http://ioc3.unesco.org/iode/contents.php?id=200) • ICES Data Policy 2006 (https://www.ices.dk/Datacentre/Data_Policy_2006.pdf) • WMO Resolution 40 (Cg-XII; see http://www.nws.noaa.gov/im/wmor40.htm) • Implementation plan for the Global Observing System for Climate in support of the UNFCCC, 2004; GCOS – 92, WMO/TD No.1219. • Global Earth Observation System of Systems GEOSS 10-Year Implementation Plan Reference Document (Final Draft) 2005. GEO 204. February 2005. • CLIVAR Initial Implementation Plan, 1998; WCRP No. 103, WMO/TS No. 869, ICPO No. 14. June 1998.

  7. I. Policy for Data Access and Use • Metadata • free and open access, no registration required • each data centre is obliged to provide the meta-data in standardized format to populate the catalogue services • Data and products • visualisation freely available • the general case is free and open access (e.g. academic purposes) • however(due to national policies) mandatory user registration is required (using Single Sign One (SSO) Service) • a “SeaDataNet role” (partner, academic, commercial etc.) is attributed to individual user using the Authentication, Authorization and Administration (AAA) Service • Each NODC attributes the roles to the users of its of country • Out of the partnership, the roles are assigned by SeaDataNet user-desk • When register, the user must accept the SDN licence agreement • each data centre node delivers data according to the user’s role and its local regulation • each data centre should provide freely the data sets necessary to develop the common products

  8. I. SeaDataNet Users Management

  9. I. User Agreement on SeaDataNet Licence

  10. Topics • PART A : Presentation of general data management rules • SeaDataNet Data Policy and Data Licence • Quality Controls • Rules for metadata sumbission to prevent duplication • PART B : Identification of main stages and available tools • Reformatting from observation format to common data format • Quality Controls • Metadata Concepts and Management (online, offline)

  11. II. SeaDataNet Services SeaDataNet Quality Control is one of the “off-line” services that provides methodologies, standards and tools to ensure the reliability, compatibility, coherence of the data: • a common Quality Control Protocol • a tool for visualization and automatic checks (ODV) On-line services off-line services

  12. II. QC procedures • Overview(IOC, ICES, EU recommendations, MEDAR Protocol) • automatic and visual controls on the data and their metadata. • Data measured from the same instrument and coming from the same “cruise” are organized at the same file, reformatted to the same exchange format and then are subject to a series of quality tests: • check of the format • check of the location and time • check of measurements • The results of the automatic control are attached as QC flags to each data value. • Validation or correction is made manually to the QC flags and NOT to the data. • The results of the QC reported to the data originator to give feedback and ask questions.

  13. MEDATLAS Quality Flags values (based to theGTSPP Flag Scale definition) 0: No QC 1: Correct value 2: Out of statistics but not obviously wrong 3: Doubtful value 4: Bad value 5: Modified value (only for the location, date, bottom depth) 9: missing value

  14. SEADATANET Quality Flags values (L021) (Based on IGOSS/UOT/GTSPP & Argo QC flags) • Quality flags • 0 No quality control • 1 The value appears to be correct • 2 The value appears to be probably good • 3 The value appears probably bad • The value appears erroneous • Information flags • 5 The value has been changed • 6 Below detection limit • 7 In excess of quoted value • 8 Interpolated value • 9 Missing value • A Incomplete information

  15. II. Main QC procedures description • Format Check • Detects anomalies like wrong platform codes or names, parameters name or units, missing mandatory information like reference to a cruise or observation system, source laboratory, sensor type • No further control should be made before the correction and validation of the archive format

  16. II. Main QC procedures description • Check of date and location • For vertical profiles • duplicate entries • date: reasonable date,station date within the begin and end date of the cruise. • ship velocity between two consecutive stations. (e.g., speed >15 knots means wrong station date or wrong station location). • location/shoreline: on land position • bottom sounding: out of the regional scale, compared with the reference surroundings • For time series of fixed mooring • sensor depthchecks: less than thebottom depth • seriesdurationchecks: consistence with the start and end date of the dataset • duplicate moorings checks • land position checks

  17. II. Main QC procedures description • Duplicates checks Conventional techniques • Algorithms - comparison of the location, time of the measurements (5 miles, 15 mins in GTSPP) - comparison of the measurements - comparison of extra metadata (platform codes- floats id, … ) • Visualization of ships tracks, transects, … • Advanced techniques • Unique data identifier-CRC Tag (GTSPP report 2002) • Keep the most complete data set

  18. II. Main QC procedures description • Measurements main checks • presence of at least two parameters: vertical/time reference + measurement • pressure/time must be monotonous increasing • the profile/time series must not be constant: sensor jammed • broad range checks: check for extreme regional values compared with the min. and max. values for the region. The broad range check is performed before the narrow range check. • data points below the bottom depth • spikesdetection: usually requires visual inspection. For time series a filter is applied first to remove the effect of tides and internal waves. • narrow range check: comparison with pre-existing climatological statistics. Time series are compared with internal statistics. • density inversiontest : (potential density anomaly, FOFONOF and MILLARD, 1983, MILLERO and POISSON, 1981) • Redfield ratio for nutrients: ratio of the oxygen, nitrate and alkalinity (carbonates) concentration over the phosphate (172, 16 and 122 in Atlantic and Indian ocean, Takahashi & al)

  19. II. Broad Range Check • Regional parameterization in MEDAR/MEDATLAS II • (plus depth parameterization)

  20. II. Main QC procedures description • Narrow range check qc flag=2, probably good data, after auto control qc=1, manually • The automatic comparison with reference climatologies is made by linearly interpolating the references at the level of the observation. • Outliers are detected if the data points differ from the references more than: • 5 x standard deviation over the shelf (depth <200m) • 4 x standard deviation at the slop and straits region (200 m< depth < 400m) • 3 x standard deviation at the deep sea (depth >400m)

  21. II. Main QC procedures description • Spikes check • The test is sensitive to the vertical/time resolution. • It requires at least 3 consecutive good/acceptable values. • It requires 2 consecutive at the surface and the bottom. • The IOC Algorithm to detect the spikes taking into account the difference in values (for regularly spaced data like CTD): |V2-(V3+V1)/2 | - |V1-V3|/2 ) > THRESHOLD VALUE • For irregularly spaced values (like bottle data) a better algorithm to detect the spikes, taking into account the difference in gradients instead the difference in values, is: | |(V2-V1)/(P2-P1)-(V3-V1)/(P3-P1)| - |(V3-V1)/(P3-P1)| |> THRESHOLD VALUE

  22. Wrong Temp value detected automatically Wrong Temp value detected automatically, but it is correct value, the previous is manually corrected z1 z1 z2 z2 II. QC procedures description • density inversiontest, the importance of visual check example of density inversion due to temperature increase with depth Suggested threshold value=0.03 for high resolution data, 0.05 for near surface and low resolution data

  23. II. Main QC procedures description • Large temperature inversion and gradient tests (World Ocean Data Centre, NODC Ocean Climate Laboratory) Relying solely to temperature data to quantify the maximum allowable temperature increase with depth (inversion) and decrease (excessive gradient) with depth (0.3 C per m, 0.7 C per m)

  24. II. Main QC procedures description • ARGO Real-Time QC on vertical profiles Based on the Global Temperature and Salinity Profile Project –GTSPP of IOC/IODE, the automatic QC tests are: Platform identification: checks whether the floats ID corresponds to the correct WMO number. Impossible date test: checkswhetherthe observation date and time from the float is sensible. Impossible location test : checks whether the observation latitude and longitude from the float is sensible. Position on land test : observation latitude and longitude from the float be located in an ocean. Impossible speed test : checks the position and time of the floats. Global range test : applies a gross filter on observed values for temperature and salinity. Regional range test: checks for extreme regional values Pressure increasing test : checksformonotonically increasing pressure Spike test : checks for large differences between adjacent values. Gradient test : is failed when the difference between vertically adjacent measurements is too steep. Digit rollover test : checks whether the temperature andsalinity values exceed the floats storage capacity. Stuck value test : checks for all measurements of temperature or salinity in a profile being identical. Density inversion : Densities are compared at consecutive levels in a profile, in both directions, i.e. from top to bottom profile and from bottom to top. Grey list (7 items): stop the real-time dissemination of measurements from a sensor that is not working correctly. Gross salinity or temperature sensor drift : to detect a sudden and important sensor drift. Frozen profile test : detect a float that reproduces the same profile (with very small deviations) over and over again. Deepest pressure test : theprofile has pressures not higher than DEEPEST_PRESSURE plus 10%.

  25. II. Main QC procedures description • CORIOLIS Real-Time QC on time series Automatic quality controls test 1: Platform Identification test 2: Impossible Date Test test 3: Impossible Location Test test 4: Position on Land Test test 5: Impossible Speed Test test 6: Global Range Test test 7: Regional Global Parameter Test for Red Sea and Mediterranean Sea test 8: Spike Test test 10: comparison with climatology

  26. II. Main QC procedures description • CORIOLIS Delayed Mode QC on profiles and time series • Automated and Visual QC (already described) • Objective analysis and residual analysis (to correct sensor drift and offsets) • World Ocean Data Centre • Objective Analysis • Post objective analysis subjective checks (to detect unrealistic – “bulls eyes” features in data sparse areas)

  27. II. References • Argo quality control manual, V2.2, 2006 (http://www.coriolis.eu.org/cdc/argo/argo-quality-control-manual.pdf) • Coriolis Data Centre, In-situ data quality control, V1.3, 2005 (http://www.coriolis.eu.org/cdc/documents/cordo-rap-04-047-quality-control.pdf) • GOSUD Real-time QC, go-um-03-01, V1.0, 2003 (https://www.ifremer.fr/bscw/bscw.cgi/0/53815) • Data Type guidelines - ICES Working Group of Marine Data Management (12 data types) (http://www.ices.dk/Ocean/guidelines.htm) • GTSPP Real-Time Quality Control Manual, 1990 (IOC MANUALS AND GUIDES #22) (http://www.meds-sdmm.dfo-mpo.gc.ca/ALPHAPRO/gtspp/qcmans/MG22/guide22_e.htm) • UNESCO/IOC/IODE and MAST, Manual of Quality Control Procedures for Validation of Oceanographic Data, 1993 (Manual and Guides #26) (http://www.jodc.go.jp/info/ioc_doc/Manual/mg26.pdf) • “Medar-Medatlas protocol, Part I: Exchange format and quality checks for observed profiles”, V3, 2001 (http://www.ifremer.fr/medar/qc_doc/med_manv3.doc) • “QUALITY CONTROL OF SEA LEVEL OBSERVATIONS”, ESEAS-RI, V1.0, 2006 (http://www.eseas.org/eseas-ri/deliverables/d1.2/) • QUALITY CONTROL PROCESSING OF HISTORICAL OCEANOGRAPHIC TEMPERATURE, SALINITY, AND OXYGEN DATA. Timothy Boyer and Sydney Levitus, 1994. National Oceanographic Data Centre, Ocean Climate Laboratory • World Ocean Database 2005 Documentation. Ed. Sydney Levitus. NODC Internal Report 18,U.S. Government Printing Office, Washington, D.C., 163 pp • Quality checks at Ifremer/Sismer(http://www.ifremer.fr/sismer/program/qc_phy/quality_UK.htm) • IGOSS Quality Flags(http://www.nodc.noaa.gov/argo/qc_flags.htm)

  28. Topics • PART A : Presentation of general data management rules • SeaDataNet Data Policy and Data Licence • Quality Controls • Rules for metadata sumbission to prevent duplication • PART B : Identification of main stages and available tools • Reformatting from observation format to common data format • Quality Controls • Metadata Concepts and Management (online, offline)

  29. III. Causes of the duplicates • RT and DM profiles from operational oceanography • Data sets from the GTS (real time transmission) with rounded values and poorly documented profiles • International Programmes and data exchange/dissemination • Data insufficiently documented and attributed to two different sources • PTS files and same station with other parameters • Data declassified by the Navies with poor meta-data • …

  30. III. Why to prevent duplications ? • effect the products preparation (bias the computations) • mistakenly reported and disseminated data

  31. III. How to handle the duplicates ? • There are copies of one data set in several in several regional databases (ICES), project (MEDAR) and global databases (WOD05) • The duplicate data should not be reach the aggregation level • The simplest way: the duplicates descriptions (metadata) must not enter the system • Submit only your national metadata (Project coordinator country= collator/data center country)

  32. Topics • PART A : Presentation of general data management rules • SeaDataNet Data Policy and Data Licence • Quality Controls • Rules for metadata sumbission to prevent duplication • PART B : Identification of main stages and available tools • Reformatting from observation format to common data format • Quality Controls • Metadata Concepts and Management (online, offline)

  33. IV. Quality Control Procedures within SeaDataNet • Reformatting • Quality Controls • Metadata Management +Information Compilation

  34. IV. Data reformatting • In general the original formats of the data files cannot be used in data management • Incomplete/not standardized meta-data • Incompatibility with QC and other processing input format • Need of a unique format for safeguarding and exchanging the data sets of the same type Data management format, archiving format and transport (exchange) format may be not necessarily the same

  35. IV. Sustainability of an archiving format The archiving format should: • be independent from the computer (and libraries) – RDBS are not appropriate • insure that any isolated data includes enough meta-data to be processed (eg. Location and date) • be compatible and include at least the mandatory fields (meta-data) requested for the greed exchange format(s) • Include additional textual or standardized “history” or “comment” fields to prevent any loss of information • Provide similar structure and meta-data for different data type such as vertical profiles and time series • These rules are normally followed also for exchange formats.

  36. IV. SeaDataNet adopted transport formats • Obligatory formats • NetCDF (Binary) for gridded data and 3D observation data such as ADCP • ODV4 spreadsheet for other data types (vertical profiles and time series • Optional • ASCII Medatlas

  37. IV. SeaDataNet Tool for data reformatting • a new reformatting tool to convert any ascii file to Medatlas and ODV formats • In addition: interacts with Mikado to produce ISO 19115 XML metadata descriptions • How it works? Next presentation by M.Fichaut

  38. Topics • PART A : Presentation of general data management rules • SeaDataNet Data Policy and Data Licence • Quality Controls • Rules for metadata sumbission to prevent duplication • PART B : Identification of main stages and available tools • Reformatting from observation format to common data format • Quality Controls • Metadata Concepts and Management (online, offline)

  39. V. SeaDataNetQuality Control Standards • SeaDataNet quality control flags (L201) • SeaDataNet Protocol V1 • Ocean Data View V4

  40. SEADATANET Quality Flags values (Based on IGOSS/UOT/GTSPP & Argo QC flags)

  41. V. Tool for quality control • Ocean Data View for automatic checks and visualization • Integration for DIVA (presentation by R. Schlitzer)

  42. Topics • PART A : Presentation of general data management rules • SeaDataNet Data Policy and Data Licence • Quality Controls • Rules for metadata sumbission to prevent duplication • PART B : Identification of main stages and available tools • Reformatting from observation format to common data format • Quality Controls • Metadata Concepts and Management (online, offline)

  43. VI. Why Metadata? • We need the metadata to discover the data • SeaDataNet built a metadata system to discover the data • It is ISO19115 compliant for interoperability with other systems • Partners’ maintain the system by submitting metadata • No metadata = no discovery of partners’ data

  44. VI. Metadata Discovery System

  45. High Level Directories – EDMED, CSR, EDIOS for describing collected observation datasets (by ships, by laboratories, by continuous observing system) VI. Metadata Discovery System

  46. Common Reference Tables – EDMERP, EDMO hold research projects and organizations metadata common to higher directories VI. Metadata Discovery System

  47. The Common Data Index (CDI) provides access to the data, information and products by data type and/or any other field distributed by the TA platforms VI. Metadata Discovery System

  48. VI. System Maintenance and Upgrade 1. Version 0 – 2006-2007 • Continuation and maintenance of existing Sea-Search system : • the data access needs several different requests to each data centres • and the data sets are delivered in different formats • 2. Version 1 – 2008-2010 • Setup of the integrated online data services to users : • networking of 10 “interoperable” data centres of the Technical Task Team • unique request to the interconnected data centres • and the data sets are delivered with a unique format • Presently under test and progressive integration of 10 data centres during 2008

  49. VI. How to submit Metadata? 1. Compile the information • For all types of data information is required about : • Where the data were collected: location (preferably as latitude and longitude) and depth/height • When the data were collected (date and time in UTC or clearly specified local time zone) • How the data were collected (e.g. sampling methods, instrument types, analytical techniques) • Howthe data are referenced (e.g. station numbers, cast numbers) • Who collected the data, including name and institution of the data originator(s) and the principal investigator • What has been done to the data (e.g. details of processing and calibrations applied, algorithms used to compute derived parameters) • Comments for other users of the data (e.g. problems encountered and comments on data quality)

More Related