270 likes | 447 Views
CERA / WDCC. Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008. Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary. Contents.
E N D
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten hannes.thiemann @ zmaw.de NCAR, October 27th – 29th, 2008
Statistics Requirements + Features General architecture Implementation (current and new) Migration Summary Contents
WDCC / CERA: General Statistics at 01-10-2008 00:00:10 Database Size (TByte): 370 Number of blobs: 6663287791 (6.6 billion) Data access by fields and not by files. Number of experiments: 1146 Number of datasets: 142062 Total size divided by number of BLOBs gives the average size of data access granules:50 - 60 kB/BLOB Basic Statistics
Users by continent Active Users 1-Jan-2008 until 14-Oct-2008
Download destinations Download destinations 1-Jan-2008 until 14-Oct-2008
Access over WAN Downloads typically quite small, but huge downloads to some extent. Small downloads imply that users are not willing to wait long … We can not scan through large files for each download Granularity has to be small Requirements and constraints
Model data Climatological runs (global and regional) (IPCC, …) Weather forecasts (DPHASE, CEOP, …) Reanalysis data Observational data (COPS, CARIBIC, …) Satellite data products Datatypes
CERA provides the ability to store data of any format: These are the formats used GRIB (60%) NetCDF (18%) Other (22%) Formats
General Architecture Midtier Data
Contact Coverage Reference Entry Webserver Proxy Status Parameter Appl. Server Metadata Data Spatial Reference Distribution Local Adm. Data Org Data Access General Architecture Select timestep + region Convert format
Database Table 1 Data of timestep i 2 Data of timestep i+1 Data of single variable 3 Data of timestep i+2 … n Data of timestep i+n Storage within CERA Index
Handicap: not enough disk space available Data stored within database: approx. 400 TB Disks available: approx. 24 TB Database has been coupled transparently to the HSM system How do we avoid frequent tape accesses? Big cache Store data as close as possible according to the needs of users: split into single variables Handicap
Migin Migout dxdb TBS - RW TBS - RW TBS - RO All tablespaces are moved “at once” to dxdb Tbl Partition 1 Tbl Partition 2 Tbl Partition 1 Data migration
Header 128k Table Lob Index Primary Key Blob data Inside the datafile
Header 128k Header 128k Frontend versus Backend Filesystem Frontend HSM Backend Part 1 = 512 MB Part 2 = 512 MB
Header 128k 3 1 2 5 4 Retrieving data Tape Request
Compression – nothing special used within the server Partitioning – allow parts of data to be moved to HSM Backup Nologging - beware of crash … Read only - two copies on tape Warehouse features
Metadata database will stay as is Oracle Databases holding data will be replaced by a new, self-made development Why? There is a certain risk that a future version of Oracle may not work with a / any HSM system On the long run some license costs shall be saved New implementation
Webserver Appl. Server Metadata Data General Architecture - new Oracle-DB Blobserver
Instead of keeping data within blobs in Oracle databases, data records will be kept within so called CERA Container Files. Ability to keep huge number of records. They provide fast access independent of position within file (granular access). Provided fault tolerance against tape damages by keeping checksums within the files. Enclose read/write operations against container files in transactions. Well known format CERA-Container
Concept / Team (namely Peter Drakenberg, DKRZ) Not yet really finished Software First software ready, in order to migrate data Convert old data Started last week, but will take at least a year Migration
1 8 Webserver Appl. Server 2 7 4 3 5 6 Dataflow: outbound Processing Metadata Data
Metadata Dataserver Dataflow: inbound Model run GFS Postprocessing
CERA allows for the storage of data of different kind Format independent Metadata enables addressing of internal and external data Users are typically fetching only small amounts of data. System allows for efficient access to small data granules By using warehousing functions like Partitioning by using small Oracle database Blobs or - in future - CERA Container files. Summary