190 likes | 367 Views
The ESA SCIENTIFIC TESTBED in CASPAR. S. ALBANI (ESA) Sergio.Albani@esa.int. CASPAR Workshop, Rome, 15-16 september 2009. SUMMARY. The importance of Earth Observation data LTDP: ESA benefits, risks and policies The ESA scientific testbed in CASPAR Conclusions. ESA EO Data Handling.
E N D
The ESA SCIENTIFIC TESTBED in CASPAR S. ALBANI (ESA) Sergio.Albani@esa.int CASPAR Workshop, Rome, 15-16 september 2009
SUMMARY • The importance of Earth Observation data • LTDP: ESA benefits, risks and policies • The ESA scientific testbed in CASPAR • Conclusions
ESA EO Data Handling ESA, through worldwide receiving ground stations, acquires data from Earth Observation (EO) satellites and archives/processes them at Processing and Archiving Facilities. ESA-ESRIN is the largest European EOdata provider and operates as the reference European centre for EO Payload Data Exploitation. At present, several thousands ESAworldwide users have online access to EO missions related metadata (10 million references), data (about 4 PB) and derived information.
The EO data avalanche… • ESA EO data archives include data from ESA missions (ERS and ENVISAT) and Third Party missions (Landsat, SPOT, ALOS, TIROS, AQUA&TERRA, etc.). • The plans of new missions indicates 5-10 times more data to be archived in next 10-15 years • Similar trend for other national space agencies ESA
The importance of EO data EO archives ensure a global coverage of the Earth: • multi-sensor data (from optical to active radar sensors); • long time series (time-span from a few years to decades); • variable geographical coverage (local, regional, global); • variable geometrical resolution (from few to few hundreds meters); • variable temporal resolution (from few days up to months).
LTDP: Benefits, risks and policies (1) • The requirements for accessing ESA historical archives are strongly increased in the last ten years and the trend is likely to increase in the future mainly for long term science and environmental monitoring. • Therefore, the prospect of losing the digital records of science (and with the specific unique data, information and publications managed by ESA) is very alarming. • We have to preserve data against changes in: • hardware, • software, • environment, • knowledge base of the scientific community…
LTDP: Benefits, risks and policies (2) • Issues for the near future concern the identification of: • type and amount of data to be preserved; • location of archives and their replication for security reasons; • technical choices (e.g. formats, media); • availability of adequate funds. • Decisions should be taken in coordination with other data owners and with the support/advice of the user community.
LTDP: Benefits, risks and policies (3) ESA Member States LTDP Working Group formed (Jan 2008) within the GSCB to define LTDP Common Guidelines (with the European EO data stakeholders) and to promote them in CEOS/GEO. LTDP Workshop ESA-ESRIN May 2008 Earth Observation Long Term Data Preservation Strategy Preservation of all European EO data for an unlimited time-span ensuring accessibility and usability Cooperative and harmonized collective approach among the EO data owners (European LTDP Framework) Application of European LTDP Common Guidelines • ESA is going to apply the • European LTDP Common • Guidelines to its own missions • LTDP is now included in • the ESA general budget • under basic activities
LTDP: Benefits, risks and policies (4) • Meanwhile ESA is active in developing appropriate techniques and strategies by promoting and participating to projects related to long term data preservation • ESA participates to LTDP related projects to: • evaluate new technical solutions and procedures to adopt when applicable to maintain leadership in using emerging services in EO; • share knowledge with other entities, also outside of the scientific domain; • extend the results/outputs of these cooperative projects in other EO and ESA communities (e.g. Record of Science office).
ESA role in CASPAR • ESA participation to CASPAR is mainly driven by the interest in: • consolidating and extending the validity of the OAIS reference model, already adopted in several internal initiatives (e.g. SAFE); • developing preservation techniques/tools covering not only the data but also the knowledge associated with them. • In CASPAR ESA is playing the role of both user and infrastructure provider for the scientific data testbed.
ESA TESTBED ACTIVITIES The ESA testbed has covered: the setup of the framework in ESA-ESRIN; the definition and collection of a significant sample of a whole processing chain dataset (viewers, converters, processors and data of different level, revision and format); the conversion of the data from the native format to a OAIS compliant format; the generation of appropriate Representation Information, Descriptive Information, Knowledge Modules and Scientific Community profiles; the analysis of ontologies to describe and preserve scientific workflows (e.g. the applicability of CIDOC CRM on scientific data); the ingestion of data in a 100% CASPAR-based system; the coping with some long term data preservation problems by using only CASPAR components, methodology and tools.
Testbed dataset (1) • The ESA selected dataset as a suitable demonstration case in the framework of the scientific testbed of the CASPAR project consists of data from GOME (Global Ozone Monitoring Experiment), a sensor on board the ESA ERS-2 (European Remote Sensing) satellite. • The GOME dataset: • has a big total amount of information distributed with a high level of complexity; • is unique because it provides more than 11 years global worldwide coverage; • is very important for the scientific community and the Principal Investigators (e.g. KNMI and DLR) that on a routine basis receive GOME data for their research projects (e.g. concerning ozone depletion or climate change); • is just a test-case because similar issues involve many other Earth Observation instrument datasets.
Testbed dataset (2) Level 0: raw data Level processors Auxiliary data Documents and methods Level 1: radiances/reflectances Level 1C: fully calibrated L1 data Level 2: geophysical data as trace gas amounts Level 3: a mosaic composed by several level 2 data with interpolation of data values to fill the satellite gaps Data viewers Examples of GOME science applications Format converters
Testbed preservation scenario (1) • Preservation of the ability to process data from one level to another • preservation of GOME data and of all components that enables the operational processing for generating products at higher levels. • DEMO: preservation of the ability to produce GOME Level 1C data starting from Level 1 data • now the ESA testbed is able to demonstrate the preservation of this GOME processing chain at least against changes of operating system or compilers/libraries/drivers affecting the ability to run the GOME Data Processor.
L1->L1C processing dataset We have to preserve the GOME L1 products, processors, reference manuals, etc. GOME L1 products L1L1C processor L1L1C source code • PSD.pdf • license.doc • disclaimer.pdf • readme_1st.doc • readme.doc • release_l01.doc • user_manual.pdf • howtouse_l01.doc • The C Bible • The OS Bible • The Ozone • The ERS-2 satellite • The GOME sensor • ERS-Products.pdf • ProductSpecification.pdf
OAIS compliant dataset SAFE AIP The RepInfo provided are contained in the manifest file and in the schemas. Content Information Descriptive Information Data Object Representation Information The filename itself. Metadata is extracted by the product and contained in the SAFE manifest. Preservation Description Information Packaging Information Empty. Reference Provenance A Cyclical Redundancy Check (CRC) code for a file. Context Fixity The principal investigator who recorded the data and the information concerning its storage, handling and migration. No packaging restrictions.
Testbed: preservation scenario (2) • After the ingestion in the CASPAR system of a complete and OAIS-compliant GOME L1 processing dataset, something (e.g. OS or gLib version) changes and a new L1->L1C processor has to be developed/ingested to preserve the ability to process data from L1 to L1C. • We can cope with these changes related to the processing by managing a correct information flow through the system, the system administrators and the users, using a framework developed using only the CASPAR components. • Moreover we can return to an user asking for L1C data not only the related L1 data plus the processor needed to generate them but also all the information needed to perform this process depending on the user needs and knowledge.
CONCLUSIONS The preservation of EO data is of vital importance for the scientific and operational user communities. ESA is pursuing the objective to ensure the perpetual preservation of these data in coordination with institutions of its member states. ESA data preservation initiatives will benefit of the results of initiatives carried out in collaboration with European data owners/providers, entities and institutions, with the objective to guarantee long term data preservation (and other similar EU sponsored projects) by adopting, when applicable, technical solutions and procedures developed in the framework of these cooperative projects. The scientific testbed implemented in ESA-ESRIN provides a good proof of the effectiveness of the CASPAR preservation framework in the Earth Observation domain focusing on methodologies and techniques to guarantee the preservation of the knowledge associated to data.