1 / 25

Overview

Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting, BAS, 5 th October 2011: Transmission, presentation and archiving of meteorological data. Overview. What is data archival Why do it?

lel
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Environmental Data Archival: Practices and BenefitsGraham Parton graham.parton@stfc.ac.ukRoyal Meteorological Society SIG Meeting, BAS, 5th October 2011: Transmission, presentation and archiving of meteorological data

  2. Overview • What is data archival • Why do it? • How do we do it within CEDA?

  3. What do we call “data archival” • Placing data into a repository which is: • Backed up • Robust (identify data corruptions) • Catalogued • Recognised repository

  4. Why archive data • Making data public - Openness of the result and repeatability are essential for scientific rigor • Place to share data with project participants • Re-purposing data • Additional services (often for free!) • Maybe required for legal reasons • Secure • Get credit • And because if you don’t….

  5. Why archive data

  6. Scale of CEDA operations >100,000,000 files holding ~ 1 Pb of data ~38,000,000 files downloaded since October 2010 19,000+ register users of which ~3600 are currently ‘active’ users 250+ datasets 26 staff Responsible for + other services and projects (e.g. UKCIP, CMIP5 partner) … i.e.. We are highly reliant on scripted systems and a well structured archive

  7. External discovery service External Users Data Suppliers discovery Catalogue Arrivals metadata view 3rd Party Data providers Ingest Web service Backup Archive Archive Backup Archive Backup download

  8. Data Preparation Data Suppliers Arrivals 3rd Party Data providers Ingest Archive Archive Archive

  9. Data Preparation • Data Management Plans • including delivery schedules • Conditions of Use/Licensing • Support suppliers in data preparation • Capture supporting documentation • (formats, calibration information, flight logs, etc.) • File naming and archive structure • Set up ingest routes

  10. Data Preparation - File structure • Take the bad data challenge…. File “sw010203” • What are these data? Guess surface winds, but on what day? • What are the units? Any convention? • How do we read the file? • Is this spatial or temporal data?... 1440 pairs of data in a file 4.31 155.3 3.92 136.1 5.15 140.2 4.23 137.1 4.75 150.2 4.71 137.9 4.35 146.5 4.52 138.0 4.83 153.7 5.40 145.8 4.63 141.0 4.90 137.3 4.31 143.3 4.58 157.0 4.94 141.7 4.65 143.1 4.63 143.0 4.88 149.5 5.42 148.5 4.92 140.4 4.04 146.7 3.92 151.5 5.02 135.3 5.06 151.6 4.65 152.3 4.31 168.8 3.79 145.3 5.92 152.9 5.02 145.8 4.77 161.6 4.79 144.1 4.60 147.5 5.33 150.1 4.81 141.0 6.02 146.9 4.38 149.0 4.42 142.5 4.58 133.4 4.35 150.5 4.96 149.8 5.56 143.4 5.08 148.5 5.19 141.6 4.40 142.4 4.10 152.6 5.02 134.0 4.94 142.9 5.27 144.4 5.38 141.5 5.88 144.8 6.00 140.1 4.75 158.3 5.08 148.1 5.46 163.5 4.27 150.8 4.69 138.8 5.71 144.0 5.21 138.8 5.00 132.4 5.06 144.4

  11. Supported Formats Highly structured metadata Standard Names

  12. External discovery service Data Discovery External Users Data Suppliers discovery Catalogue Arrivals metadata 3rd Party Data providers Ingest Web service Archive Archive Archive

  13. CEDA Catalogue

  14. NERC Data Discovery Servicedata-search.nerc.ac.uk

  15. CEDA Document Repository • cedadocs.badc.rl.ac.uk

  16. Citations for Data Creators: DOIs Citation (and DOI) Data Citation and DOI… but only if in a recognised repository

  17. External discovery service External Users Data Suppliers discovery Catalogue Arrivals metadata view 3rd Party Data providers Ingest Web service Archive Archive Archive download Data Services

  18. Visualisation Services

  19. Visualisation Services ISIC Video Wall

  20. Visualisation Services

  21. Processing ServicesCEDA WPS: ceda-wps2.badc.rl.ac.uk/ui/home Chain services together Job either run straight away Or sent to run on backend service Download result

  22. Processing ServicesTrajectory Service

  23. OPeNDAP Service • With security layer • Navigable and scriptable interface to archive • CEDA has applied security shell using “Open ID” technology • Give powerful sub-setting service for large datasets

  24. What’s on the horizon? • Continue to develop visualisation and data processing services • Increasing data volumes becoming too large to move around • Hosting services – provide virtual environments for people to work on the data without downloading • From Petascale to Exoscale • But all this NEEDS well data that uses standards driven metadata and formats

  25. Take Home Messages • Plan for data management • Tap into standards when preparing data • Get data catalogued for data discovery • Data in supported repositories leads to recognition for efforts preparing data • A suite of additional services add value to existing data Team Digial Preservation Video

More Related