920 likes | 1.06k Views
ERS-2 LBR data consolidation approach and results X-PReSS Inputs to second ESA Review meeting of EO Assets Decision Board 26.02.2016 X-PReSS Team. Agenda. X- PReSS ERS-2 Consolidation data inputs and approach for: Altimeter (ERAC) Microwave (EMWC) Sar Wave (EWAC)
E N D
ERS-2 LBR data consolidation approach and results X-PReSS Inputs to second ESA Review meeting of EO Assets DecisionBoard 26.02.2016 X-PReSS Team
Agenda • X-PReSS ERS-2 Consolidation data inputs and approach for: • Altimeter (ERAC) • Microwave (EMWC) • Sar Wave (EWAC) • Wind Scatterometer (EWIC) • ATSR-2 (EATC-2) • GOME (EGOC) • Telemetry (EGH) • X-PReSS L0 consolidation results
Consolidation – Activity summary • Inputs from ESA • L0 Data provided by ESA DL (NRT, partially Consolidated, etc..) • Previous data consolidation activity results (if available) • Inventory list of provided data (if available) • Mission phases • Acquisition planning inputs including downlink strategy (including SHAQ from CUS and from PDGS) • Stations availability • X-PReSS activities • Data collection of provided data • Support in identification of additional source of data (if applicable) but to be confirmed/approved by ESA • Data Cleaning • “Removal” of corrupted files • Alignment and harmonization of the data (e.g. file naming, format, packaging) • Homogenization of data (e.g. data coming from different sources) • Merging of Data • Data Verification • Gaps identified against actual consolidated dataset into the service infrastructure • Data availability for data validation (external)
Team composition • Facilities • Ifremer (EWIC, ERAC, EMWC, SarWave) - France • Cems (ATSR) - UK • INTA (EGH) - Spain • MEEO (Gome) - Italy • Data Experts: • Jean-Francois Piolle (Ifremer) - EWIC, ERAC, EMWC, Sar Wave • Juan Antonio Rodríguez García & INTA Team (INTA) - EGH • Simone Mantovani (Sistema) – ATSR • Alberto Bigazzi (Serco) – GOME • Project Managers: • M. Meloni (ERAC), S. Maltese (EWIC,EMWC,Sar Wave,Gome,EGH), • G. Davies (ATSR)
ERS LBR Scat/RA/MWR/Sar Wave/EGH Consolidation Approach (1) • Renaming the files to the same naming convention • Matching the data gaps with instrument, stations or platform unavailabilities – using the downlink and SHAQ information • Merging the various sources into a single repository and raising the conflicts (files from alternative sources with same name and different checksum) • Detecting redundant files, covering the time frame or overlapping and in such case investigating the content and selecting the best one • Detecting the missing orbits and main data gaps and checking if they can be explained by some instrument unavailability or if there is any track of these orbits in FPAF products catalogue : this leads to the decision of trying to recover additional missing orbits and iterating again the full cycle
ERS LBR Scat/RA/MWR/Sar Wave/EGH Consolidation Approach (2)
ERS LBR ATSR-2 Consolidation Approach (1/4) • Step 1: Pre-consolidation • Recovery of duplicates • Metadata generation / re-generation • Products renaming [ad-hoc slide] • Data merging [ad-hoc slide] • Step 2: Consolidation • Product classification [ad-hoc slide] • Categories: Corrupted, Quarantined, Duplicates, Master • Master dataset generation • software_version: the latest the better • generation_time: the latest the better • Product confidence Data (PCD): the smaller the high quality • Remove overlapping products (i.e. exact duplicate and nested products) • Completeness / Gaps analysis • Missing products identification / management
ERS LBR ATSR-2 Consolidation Approach (2/4) • Step 1: Products renaming • target product filename = Shape #3: WRS Stripline / Full orbits products [1] • ERXX_OPER_III_III_LP_yyyymmddThhmmss_yyyymmddThhmmss_ooooo_tttt_vvvv.ESA • where • • XX: the satellite number (01 for ERS-1 or 02 for ERS-2) • • III_III: instrument identifier (i.e AT2_ATS) • • L: the product level (in our case, always 0) • • YYYYMMDDTHHMMSS: The products start and stop date/times • • oooooo: the orbit number (6 digits, 0 left padded) • • tttt: relative orbit number (4-digit, 0 left padded) • • vvvv: software version (4-digit, 0 left padded) [1] ESA PDGS Harmonised File Naming Convention( GMGT-MMAN-EOPG-TN-13-0005), Issue: 2.0, Date: 12/11/2013
ERS LBR ATSR-2 Consolidation Approach (3/4) • Step 1: Data merging • Pre-consolidated products (data, .log, .XML) are moved in the target consolidation directory • <root consolidation directory> • ├──sensor = AT1 | AT2 • │├── YYYY • ││├── MM • │││└── DD • │││├── <target filename>.ESA • │││├── <target filename>.log • │││├── <target filename>.xml • │││├── <target filename>.1.ESA • │││├── <target filename>.1.log • │││├── <target filename>.1.xml • Corrupted / quarantined products are moved in the target error directory • <root error directory> • └── source data directory • Validproducts are identified with the incrementalnumber • .ESA, first validproduct (= to M ifduplicatesaren’tidentified) • .1.ESA, first duplicate • … • .N.ESA, Nduplicate
ERS LBR ATSR-2 Consolidation Approach (4/4) ASTR L0 • Step 2: Master dataset generation • Labels definition • C = corrupted • Not readable • Q = quarantined • Metadata generation issues • D = duplicate (“candidate” master) • Metadata are properly generated (i.e. the Main Product Header is readable for each Data Set Record) • file is properly renamed • M = master • Valid products + duplicates are removed Read C Rename Q D Master generation M
ERS LBR ATSR-2 Consolidation Approach (5/5) • Step 2: Gap Analysis Plannedproducts Actualproducts Gaps - - + - Option A: DMOP + SatAct + Downlink (after OBR failure) Option B: DMOP + SatAct + SHAQ (after OBR failure) Overlapsare removed Gaps (+) Matching the plannedunavailability and actualproducts Collected ATSR products Overlaps are removed Gaps (-) Matching the plannedproducts and actualunavailability
GOME Consolidation approach GOME data had already passed a first consolidation stage, producing archive DL029, coming from DLR. This archive has been held as the Reference Archive. Data corruption on the entire dataset classified as master has been checked using Java tool. Data coming from other seven archives, have been used as gap-filling. Selection Criteria: Overlaps Largest spanning file, for each given orbit Gaps extracted and filtered against station availabilities and instr. UNV. (including SHAQ information)
EWIC ESA–DL Data source inputs The table below summarizes the list of sources that have been included in the consolidation.
EWIC – Data consolidation results (1) The figure below shows the respective number of files collected for each source. Only the valid files (in orange) are considered for the master consolidation.
EWIC Consolidation results (2) The content of these sources has been first analysed to detect corrupted files or wrongly named files. The table below summarizes the results of this analysis. Initial number of files considered in the merging: 227723
EWIC Master dataset overview First step : copy all unique files Number of clones : 73.864 Total final number of files : 142.851
Segregating overlapping data files • Data files are no strict clones • May have the same dates and different content • Fully overlapping : nested within another • Overlapping partially • Small overlap • Large overlap • Simple approach (version number) was not conclusive and discarded
Effect of segregating overlapping data files • Selected approach seems to reject some significant amount of data beyond onboard recorder failure (2003) – up to 1 % • Approach investigated and revised with ESA support • Anyway, recommendation was always to include « rejected » files in the master • 16081 files rejected (over 142851)
AMI-Wind unavailabilities • Consolidation and merging of various sources of unavailabilities (here in cumulated nb of days)
Ground station acquisition plan analysis • Acquisition plan was provided with two different sources ”L2R” and “SHAQ” • Some issues identified and solved during the analysis: • L2R had no dates, dates had to be calculated with an orbit propagator - cross checked the accuracy of this. • The two sources that we compared to assess the content and use them in proper way to identify the gaps
X-PReSS AMI-Wind results Consolidated archive In yellow, takes into account both «segregated» files and unavailabilities. The data intervals are merged with the planned acquisition time from the two available sources (« L2R » - green - and « SHAQ » - purple), just to show the differences applying only one of the 2 sources. If the archive was complete and acquisition plan was not based on stations best effort after OBR failure, the yellow bar would match the green / purples Total coverage (incl. «segregated») : 3142.88 days, 142851 files, 53.20%
AMI-Wind results overall (1) Consolidated archive Total coverage (incl. «segregated») : 3142.88 days, 142851 files, 53.20%
X-PReSS AMI-Wind overall results (2) • Overview of the ERS-2 LBR Scatterometer consolidation project • Big improvement of the Ewic consolidated archives
AMI-Wind overall results (2) The following table summarizes the ratio of completeness:
AMI-Wind consolidation summary • Initial number of files considered in the merging: 227723 • Number of files retained in the master archive: 136359 • Number of files retained in the master archive, including segregated files (*): 142851 • Estimated completeness (data coverage only): 51.58% • Estimated completeness (data coverage vs recorded unavailabilities): 53.20% • Estimated completeness (data coverage vs recorded unavailabilities, SAR acquisition plan and post-OBR LBR acquisition plan): 86.16% • (*) As explained in the consolidation note, we recommend to include the segregated files in any further usage of this archive.
ERAC ESA – DL data source inputs The table below summarizes the list of sources that have been included in the consolidation.
ERAC – Data Collection The figure below shows the respective number of files collected for each source. Only the valid files (in orange) are considered for the master consolidation.
ERAC Consolidation results (1) The content of these sources has been first analysed to detect corrupted files or wrongly named files. The table below summarizes the results of this analysis. Initial number of files considered in the merging: 417674
ERAC Master dataset First step : copy all unique files Number of clones : 229881 Total final number of files : 168458
Altimeter unavailabilities • Consolidation and merging of various sources of unavailabilities (here in cumulated nb of days)
Altimeter results • In the next slide we provide per year the total data coverage including (in red) or not including (in blue) the segregated data files (deemed as redundant with existing data). The remaining data gap to reach the 100% bar may be caused: • By unavailabilities of the instrument or telemetry: the yellow bar combines the unavailability time coverage with the data coverage. • By no acquisition by ground stations, particularly after the on-board recorder (OBR) failure, in June 2003 : two different sources are used (“L2R” in green, “SHAQ” in purple). The two bars combine the above together with the effective acquisition time provided by the ground stations. The top of the bars therefore indicate the maximum expected data coverage. This is only valid for beyond the OBR failure. • By missing data, which is the quantity we are trying to estimate here. Before the OBR failure, this is represented by the gap between the top of the yellow bar (found data coverage merged with instrument unavailability) and the 100% bar. Beyond the OBR failure, this is represented by the gap between the top of the yellow bar and the top of the purple bar (merging with acquisition time) considering the “SHAQ” the most accurate.
Altimeter results (1) Consolidated archive Total coverage (incl. «segregated» and unavailabilities) : 3712.5 days, 168458 files, 62.85%
Altimeter results (2) • Overview of the ERS-2 LBR Altimeter consolidation project • Big improvement of the Erac consolidated archives
Altimeter overall results (3) The following table summarizes the ratio of completeness:
Altimeter consolidation summary • Initial number of files considered in the merging: 398339 • Number of files retained in the master archive: 159692 • Number of files retained in the master archive, including segregated files (*): 168458 • Estimated completeness (data coverage only): 59.88% • Estimated completeness (data coverage vs recorded unavailabilities): 62.85% • Estimated completeness (data coverage vs recorded unavailabilities and acquisition plan): 96.53% • (*) As explained in the consolidation note, we recommend to include the segregated files in any further usage of this archive.
EMWC ESA – DL data source inputs The table below summarizes the list of sources that have been included in the consolidation.
EMWC – Data Consolidation results (1) The figure below shows the respective number of files collected for each source. Only the valid files (in orange) are considered for the master consolidation.
EMWC Consolidation results (2) The content of these sources has been first analysed to detect corrupted files or wrongly named files. The table below summarizes the results of this analysis. Initial number of files considered in the merging: 511982
EMWC Master dataset overview First step : copy all unique files Number of clones : 278471 Total final number of files : 233511
EMWC Consolidation results • In the next slide we provide per year the total data coverage including (in red) or not including (in blue) the segregated data files (deemed as redundant with existing data). The remaining data gap to reach the 100% bar may be caused: • By unavailabilitiesof the instrument or telemetry: the yellow bar combines the unavailability time coverage with the data coverage. • By no acquisition by ground stations, particularly after the on-board recorder (OBR) failure, in June 2003 : two different sources are used (“L2R” in green, “SHAQ” in purple). The two bars combine the above together with the effective acquisition time provided by the ground stations. The top of the bars therefore indicate the maximum expected data coverage. This is only valid for beyond the OBR failure. • By missing data, which is the quantity we are trying to estimate here. Before the OBR failure, this is represented by the gap between the top of the yellow bar (found data coverage merged with instrument unavailability) and the 100% bar. Beyond the OBR failure, this is represented by the gap between the top of the yellow bar and the top of the purple bar (merging with acquisition time) considering the “SHAQ” the most accurate.
EMWC Consolidation results (1) Consolidated archive Total coverage (incl. «segregated» and unavailabilities) : 4053.18 days, 233.511 files, 68.61%