190 likes | 293 Views
Enhanced File Consistency Checking. ADMT #10 – Toulouse, France 30 September - 2 October 2009 Mark Ignaszewski FNMOC. Background. 2 types of format and consistency checks failures: Errors: These block distribution of the data on the GDAC until corrected.
E N D
Enhanced File Consistency Checking ADMT #10 – Toulouse, France 30 September - 2 October 2009 Mark Ignaszewski FNMOC ADMT-10 30 September – 3 October 2009
Background • 2 types of format and consistency checks failures: • Errors: These block distribution of the data on the GDAC until corrected. • Warnings: Things we’d like to see corrected but data is distributed on the GDAC as is • Unless otherwise noted, the tests generate errors and block data distribution ADMT-10 30 September – 3 October 2009
Basic Format Checking • Basic format checking has not changed • All file types are checked to ensure the Dimensions, Variables, and Attributes conform to the Argo specification • Including “highly-desirable parameter” checks • All strings checked for NULL characters (Warning) ADMT-10 30 September – 3 October 2009
Enhanced Profile Checking Meta-data checks Date Checks QC code checks <PARAM> variable checks D-mode specific file checks ADMT-10 30 September – 3 October 2009
Meta-data Checks • PLATFORM_NUMBER: • 5 or 7 numeric digits (second digit “9” for 7) • DATA_STATE_INDICATOR: • One of the recommended codes from reference table 6 • DIRECTION: ‘A’ or ‘D’ • DATA_CENTRE: Valid for the DAC • DATA_MODE: ‘A’, ‘D’, or ‘R’ • INST_REFERENCE: Set (Warning) • POSITIONING_SYSTEM: Set (Warning) • WMO_INST_TYPE: Valid (ref table 8) (Warning) ADMT-10 30 September – 3 October 2009
Date Checks • All dates are checked for validity and consistency • String date settings checked for validity • DATE_UPDATE / DATE_CREATION / HISTORY_DATE / CALIBRATION_DATE • 14 digit strings; valid (e.g., seconds must be 0 to 59) • DATE_UPDATE and DATE_CREATION must be set ADMT-10 30 September – 3 October 2009
Consistency of Dates Jan 1, 1995 JULD Within 12 hours JULD_LOCATION DATE_CREATION HISTORY_DATE No order imposed CALIBRATION_DATE DATE_UPDATE Within 2 days (Warning) GDAC file time ADMT-10 30 September – 3 October 2009
QC Code Checks JULD_QC and POSITION_QC: Valid values ADMT-10 30 September – 3 October 2009
<PARAM> Checks • STATION_PARAMETER: • Only valid parameter names • No “blank” entries • No duplicate entries • PRES, TEMP, PSAL are required • Check that the <PARAM> variables exist for every STATION_PARAMETER • Check that no other <PARAM> variables (with data) exist in the file. • If mode = ‘A’ or ‘D’: Check that all <PARAM>_ADJUSTED have data • Subject to the D-mode “QC=4” rules in the QC manual ADMT-10 30 September – 3 October 2009
<PARAM> Checks (continued) • <PARAM>_QC and _ADJUSTED_QC • Only valid QC codes - No “fill values” • Missing data flagged with 0, 4, 9 • Real-time profiles: Only codes 0 through 4 • Required parameters (PRES, TEMP, PSAL): • Cannot be code 0 if data is not missing • PROFILE_<PARAM>_QC: • Valid value • Correct value • Check that N_PARAM and N_LEVEL are not larger than necessary (Warning) ADMT-10 30 September – 3 October 2009
D-mode File Checks • DATA_MODE = “D” • DATA_STATE_INDICATOR = “2C” or “2C+” • Same <PARAM> in PARAMETERS as in STATION_PARAMETERS • If PRES_ADJUSTED_QC = 4, TEMP_ADJUSTED_QC and PSAL_ADJUSTED_QC = 4 • *_ADJUSTED = missing ADMT-10 30 September – 3 October 2009
D-mode File Checks • SCIENTIFIC_CALIB_COMMENT and CALIBRATION_DATE set • for every <PARAM> and N_CALIB • At least one HISTORY record • HISTORY_INSTITUTION and _DATE set • Plus, the <PARAM> and date checks previously discussed ADMT-10 30 September – 3 October 2009
Results • Tested every cycle with a 1 or 5 in the ten’s digit • 015, 055, 115, 155 Warnings • 9 DACs have problems with NULLs in strings • KORDI seems to be OK • Some only in few variables, some in many variables • A couple “N_LEVELS too large” • KORDI sets the variable larger than necessary a lot • “INST_REFERENCE” not set • 1 JMA file ADMT-10 30 September – 3 October 2009
Results: Date checks • JULD after DATE_CREATION • Coriolis – only a few files • INCOIS – many files – large time differences • HISTORY_DATE and/or CALIB_DATE after DATE_UPDATE • CSIO – Many files – Big time differences • Invalid dates: • AOML, INCOIS: Bad values • MEDS: Too short (missing seconds) ADMT-10 30 September – 3 October 2009
Results: <PARAM> checks • ‘A’ or ‘D’: *_ADJUSTED not set • Identified some missing *_ADJUSTED data • Was not handling the “QC = 4” rule correctly • PROFILE_<PARAM>_QC: Incorrect values • AOML, Coriolis, CSIO, JMA, MEDS • Missing variables • CSIRO, INCOIS (DOXY_ADJUSTED_ERROR) • <PARAM>_QC and *_ADJUSTED_QC • Numerous inconsistencies reported • Some illegal values ADMT-10 30 September – 3 October 2009
Results: D-file checks • DATA_STATE_INDICATOR: Coriolis and MEDS: • Question about “2C+” • A few MEDS files set to “2B” ADMT-10 30 September – 3 October 2009
Results: D-file checks PARAMETER and SCIENTIFIC_CALIBRATION_* • PARAMETER or CALIB_DATE not set for many files • PRES, TEMP: AOML, CSIO, MEDS • TEMP, CNDC: CSIRO • PRES, TEMP, PSAL: JMA • SCI_CALIB_COMMENT not set for many files • TEMP, CNDC: CSIRO • PRES, TEMP: JMA • N_CALIB larger than necessary in many files • BODC (many files), Coriolis (few files) • No calibration information in some D-files: Coriolis ADMT-10 30 September – 3 October 2009
Plan • Implement in routine processing AS ADVISORY • 20 Oct 2009 • Transition to IFREMER: Oct-Nov 2009 • Implement as operational checker: end-Nov 2009 ADMT-10 30 September – 3 October 2009
Still Needed • <PARAM>_ADJUSTED_ERROR: Set • Cross-file checks: • Cycle-to-cycle: • Consistent positions and times • Duplicates • Meta-data file comparisons • Greylist: Should QC be checked? ADMT-10 30 September – 3 October 2009