230 likes | 478 Views
Atmospheric Data Management - A Challenge -. Anne De Rudder and Sue Latham Rutherford Appleton Laboratory, UK. European Space Weather Week 3 Brussels, November 13-17, 2006. In 2 or 3 decades, the universe of data has gone from…. ?. . …to. European Space Weather Week 3
E N D
Atmospheric Data Management - A Challenge - Anne De Rudder and Sue Latham Rutherford Appleton Laboratory, UK European Space Weather Week 3 Brussels, November 13-17, 2006
In 2 or 3 decades, the universe of data has gone from… ? …to European Space Weather Week 3 Brussels, November 13-17, 2006
The BADC • One of the NERC designated Data Centres and a component of the NCAS • Documented long-term data archive (currently about 130 catalogued datasets) • About 8,000 registered users worldwide, among whom 3,000 have applied for access to specific datasets and 2,000 have downloaded data in the past year • Data management in support to NERC research programmes, grants and facilities and occasionally to some international research projects • Data are distributed via the web • Assistance to users regarding atmospheric data issues (trajectories, online help desk, visualisation facilities, software, links, …) http://badc.nerc.ac.uk/ European Space Weather Week 3 Brussels, November 13-17, 2006
Contents • Data policies – their purpose and implementation • Model versus observation • Metadata • Citation and publication • Data access networks (grids) • Speaking the same language • A few traps to beware of European Space Weather Week 3 Brussels, November 13-17, 2006
Data policies • Aims • Ensuring the swift exchange of knowledge within a research project. • Ensuring that the newly acquired knowledge, or at least the material on which it relies, is kept for possible future reference, improvement and use and is made available to the community. • Ensuring that the data is documented in a way that will allow long-term access to — and understanding of it • Ensuring that researchers’ rights are not infringed on. • Data management plans • To implement the principles outlined in the data policy • To plan how and when data will be generated, shared, stored within a project • DMPs also include arrangements for the provision of supporting third-party data (e.g. met data from the UK MetOffice, provision of NRT data or forecasts to support field campaigns) European Space Weather Week 3 Brussels, November 13-17, 2006
Data policies • To ease the exchange of knowledge within the project: • Submission schedule and deadlines taking into account the synergy between the different groups taking part in the project • Common format (often seen as a devilish obstacle in our Excel times…) • Provision of a workspace (e.g. BSCW) to be used as • a discussion forum • a way to work on common documents • a way to validate and format preliminary data • To provide a long-term archive to the community: • Regular backups on at least two supports and in two places • Advertisement of the dataset (dataset catalogue, dataset “publication”) European Space Weather Week 3 Brussels, November 13-17, 2006
Data policies To ensure that this long-term archive can be read, interpreted and used: • Documentation (metadata) should be as • specific • accurate • explicit • complete as possible • Use of a worldwide metadata standard (CF Convention) • Use formats that allow the metadata to be attached to the data inseparably European Space Weather Week 3 Brussels, November 13-17, 2006
Metadata • To associate to a dataset key terms that will allow its discovery. • To give all the information needed to read, understand, interpret the data. Metadata standards Integrate a terminology, recommendations on the metadata content and some format considerations The Climate Forecast Metadata Convention was developed for NetCDF but is largely applicable to information provided with any atmospheric data regardless of its format. Providing (good) metadata and conforming to metadata standards is a habit that still needs to be acquired… European Space Weather Week 3 Brussels, November 13-17, 2006
Data policies Protecting researchers’ work and rights: Temporary restriction of access • In order to allow the researchers to be the first ones to analyse and publish their data, while at the same time ensuring some synergy between the different groups participating to the project • During the project duration or for a certain period of time after the end of the project, access is restricted to the project participants… • With exceptions for close collaborators or participants to associated projects • This retention period ranges from 1 to …10 years! • Password protected system • Modalities of application and of access granting vary (e.g. consultation of PI, list of authorised users, etc.) • … after which, the data is released to the public domain. European Space Weather Week 3 Brussels, November 13-17, 2006
Data policies Access to restricted data – Authorised Users • Public • Discovery metadata immediately visible • Free access to the data after the retention period (sometimes, Conditions of Use continue to apply) External Collaborators Project participants Public • External Collaborators • (during retention period) • Must apply for access • Applications channelled through Project PI(s) • Project participants • Immediate availability • On application European Space Weather Week 3 Brussels, November 13-17, 2006
Data policies Protecting researchers’ work and rights: Conditions of use and publication • Applying during the project and sometimes after it has ended • Sometimes included in the data files, as a stamp • Committing the user to respect rules such as • Restricting the use of the data to the research topic stated at the time of application • Not to disclose the data to other parties • Contacting the data provider • Acknowledging the data provider • Offer co-authorship to the data provider European Space Weather Week 3 Brussels, November 13-17, 2006
Data policies Intercontinental initiative International project Research facility National programme European Space Weather Week 3 Brussels, November 13-17, 2006
Model versus observation Nobody believes a modelling paper except the author. Everybody believes an observational paper… except the author. Is there such a clear difference between the two things? Is processed or derived data observation or modelling? Is a programme “model data”? (Quoted by David Stevenson, University of Edinburgh, at an UTLS Ozone Science Meeting) For the purpose of data management, any output of model computation (e.g. simulations), datasets resulting from some kind of data assimilation technique, compilation of observations from different sources (synthesized datasets) Model data = … which have in common to be more likely or more quickly superseded by newer versions than observations are. They are also usually the end-product of project, while observations are a starting point for further analyses and studies. European Space Weather Week 3 Brussels, November 13-17, 2006
Model versus observation BADC Guidelines for the Archival of Simulated Data • Codes archived only as metadata to support model output • Datasets peer-reviewed at regular intervals (a few years) • Criteria to select model runs to be archived for the long-term • Likely future existence of a community of potential users. • Historical, legal or scientific importance likely to persist. • The results will be used in an intercomparison exercise. • Integration of observation data in a way that adds value to the observations. • The results have been the basis of a publication. • The results have confirmed or led to some outstanding discovery. European Space Weather Week 3 Brussels, November 13-17, 2006
Citation and publication • Some projects gather together the worlds of librarians and data scientists, e.g. • CLADDIER • To investigate how datasets can be (better) • versioned • catalogued • peer-reviewed • referenced in papers • published European Space Weather Week 3 Brussels, November 13-17, 2006
Citation and publication European Space Weather Week 3 Brussels, November 13-17, 2006
E-grids • Networks linking several organisations with similar or complementary competences in such a way as to ensure their interoperability. • E.g. network of data repositories, models and computers allowing the user to search and use these resources simultaneously and transparently. • Issues: • Transfer of information (balance between redundant storage and speed of transfer) • Authentication (security and access) • Format conversion • Vocabulary (metadata standards) European Space Weather Week 3 Brussels, November 13-17, 2006
E-grids European Space Weather Week 3 Brussels, November 13-17, 2006
E-grids • The NERC Data Grid (NDG) Project • Infrastructure system to enable the discovery and retrieval of data held at distributed data centres via one single portal • Partners: BADC, BODC, PCMDI (LLN) • Security issues tackled through “role mapping”, i.e. definition of equivalent authorisations (avoiding the user the need to register with each organisation) • A discovery metadatabase already exists based on MOLES = Metadata Objects for Links in Environmental Science • Further we intend to make the connection between data held in managed archives and data held by individual research groups seamless in such a way that the same tools can be used to compare and manipulate data from both sources. • What will be completely new will be the ability to compare and contrast data from an extensive range of (US, European, UK, NERC) datasets from within one specific context. European Space Weather Week 3 Brussels, November 13-17, 2006
E-grids European Space Weather Week 3 Brussels, November 13-17, 2006
Speaking the same language Standard terminologies • Sets of terms of reference with, sometimes, unique identifiers (key values), definitions and version numbers • System of relationships between terms (synonyms, inclusion, related terms) • Underpin catalogues and search engines • Ex.: GCMD, CF, SeaDataNet MOLES (Metadata Objects for Links in Environmental Science): • The metadata scheme underpinning the NDG discovery tool (based on a set of XML records) and the next BADC catalogue (relational metadatabase) • Developed in-house • Integrates tentative mappings between GCMD, CF, SeaDataNet European Space Weather Week 3 Brussels, November 13-17, 2006
Lessons learnt and traps to avoid • Envisage the data policy at an early stage of a project proposal and in consideration of already running projects that may become associated or involved. • Design and develop an open standard terminology with direct input from the researchers and carefully thought relationships between terms. • Do not try to build a terminology that covers everything but focus on the vocabulary needed in your community. • Resist the temptation of replacing tools (software, applications, conceptual tools) every time a new shiny one is launched on the market. European Space Weather Week 3 Brussels, November 13-17, 2006