1 / 22

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint. N.P. Drakenberg , H. Höck , M . Lautenschlager, H. Luthardt , H.Ramthun , M. Stockhause , H. Thiemann World Data Centre for Climate at the German Climate Computing Centre (DKRZ)

reidar
Download Presentation

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager,H. Luthardt, H.Ramthun, M. Stockhause, H. Thiemann World Data Centre for Climateat the German Climate Computing Centre (DKRZ) Hamburg, Germany

  2. Overview The WDC for Climate in several collaborations Data Storage: Technology – Tapes and Disks Data Storage: LObStER – the Tape Storage Tool Storage Policy Long Term Archiving DOI - Digital Object Identifier

  3. The World Data Centre for Climate The German Climate Computing Centre (DKRZ)is held by… Max Planck Society,University of Hamburg, and others. Mission:Provide HP computing power and storage for the German Earth Science community

  4. The WDCC asWISData Collection & Production Centre WMO Information System (WIS) National Centres Global Information System Centres Data Collection and Production Centres

  5. The WDCC in theICSU World Data System International Council for Science (ICSU) World Data System (WDS) World Data Centres (WDC) WDC Cluster Earth System Research:WDC-Mare, WDC-RSAT, WDC-Climate

  6. CMIP5 Data Nodes Replicatedmodeloutput CMIP5/IPCC-AR5 PCMDI, BADC, & WDCC form a data federation About 1 PB Data are replicated UK: BADC ~ 1 PByte HD US: PCMDI: ~1 PByte HD CMIP5/IPCC Data Federation DE: WDCC ~1 PByteHD 7

  7. Evolutionof Data Quantities Climate Model Data: Relative homogeneous but huge amounts!Needed: Tape access (nearline)

  8. CERA DB Layer • When • How • What • Where • Who Data Flows Midtier Storage@DKRZ TDS Archive: files Appl. Server HPSS9 PB LobServer Container: Blobs

  9. LObStER:Large Object Storage and Efficient Retrieval Huge amounts of data in each container file Very different sizes of records: 64b .. 2 Gb Efficient administration of all records Irregular access patterns(access latency independent of the record position) Transactional behaviour for read/write Fault tolerance for HD, controller, tapes, etc

  10. LObStER Application generic JDBC-driver  Lobsterconfigurationmanager  Application Intranet Internet specific JDBC-drivers loaded

  11. LObStER Oracle RDB(or other)‏ Cache Lobsterobjectmanager show-container read-record fetch-records

  12. LObStER:The Data Containers Container files with blocked format 64-bit files and 64-bit internal position referencing Max file size: 16384 PBytes Entries stored in ≥1 blocks Block sizes 2k, k ∈ { 8, 9, 10, …, 62 }

  13. LObStER:The Data Containers header-blocks direct-pointer-blocks indirect-pointer-block data-blocks

  14. Long Term Archiving Several steps: specification & concept filling of metadata & data quality checks & DOI LTA for, e.g., EUCLIPSE, MedCLIVAR, combine

  15. LTA Costs depend on complexity and efforts at our site: metadata reformatting etc

  16. Long Term Archiving Quality Checks on three levelsQC L1: conformity to general standards (format, ...)QC L2: coarse automated content checksQC L3: detailed spot checks: TQA – Technical Quality Assurance SQA – Scientific Quality Assurance

  17. LTA:CMIP5 as an Exampleof a Federated Activity Distributed QC Level2 Checks at Multiple Sites Central QC Repository Central QC Level3 ChecksDOI Publication AgencyLong-Term Archive QC services QC Service Layer QC services QC Service Layer QC L3 Tools Project QC Metadata Repository QC L2 Tool SQA GUI

  18. LTA:CMIP5 as an Example Data Nodes IDF Data Catalogue MDInput DOI Catalogue Data Quality Control MD on model & simulation MD ondata MD onquality Registration Project MD Repository MD harvestduring project Data from nodes MD export DOI Publication Agency with Long Term Archive TQA SQA by Author Data Long Term Archive (LTA) MD LTA DOI Target Page DOI access MD harvest after archiving

  19. WDC-Climate asPublishing Agency of the IDF doi.org DataCite.org tib-hannover.de wdc-climate.de International DOI Foundation RegistrationAgencies NationalOrganizations Publisher International DOI Foundation DataCite TIB, BL, … WDCC, …

  20. Visibility of LTA Datain Public Catalogues DOI is given Catalogue metadata issent to the RegistrationAgency via the national organization

  21. The Data Life Cycle Management Virtual Research Environment Data Production Data Dissemination Data Evaluation Long Term Archive

  22. Thank you, Questions?

More Related