TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time

TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time Baudouin Raoult Data and Services Section ECMWF

The TIGGE core dataset • THORPEX Interactive Grand Global Ensemble • Global ensemble forecasts to around 14 days generated routinely at different centres around the world • Outputs collected in near real time and stored in a common format for access by the research community • Easy access to long series of data is necessary for applications such as bias correction and the optimal combination of ensembles from different sources

Building the TIGGE database • Three archive centres: CMA, NCAR and ECMWF • Ten data providers: • Already sending data routinely: ECMWF, JMA (Japan), UK Met Office (UK), CMA (China), NCEP (USA), MSC (Canada), Météo-France (France), BOM (Australia), KMA (Korea) • Coming soon: CPTEC (Brazil) • Exchanges using UNIDATA LDM, HTTP and FTP • Operational since 1st of October 2006 • 88 TB, growing by ~ 1 TB/week • 1.5 millions fields/day

TIGGE Archive Centres and Data Providers UKMO CMC ECMWF CMA NCAR MeteoFrance NCEP JMA KMA CPTEC Archive Centre Current Data Provider BoM Future Data Provider

Strong governance • Precise definition of: • Which products: list of parameters, levels, steps, units,… • Which format: GRIB2 • Which transport protocol: UNIDATA’s LDM • Which naming convention: WMO file name convention • Only exception: the grid and resolution • Choice of the data provider. Data provider to provide interpolation to regular lat/lon • Best possible model output • Many tools and examples: • Sample dataset available • Various GRIB2 tools, “tigge_check” validator, … • Scripts that implement exchange protocol • Web site with documentation, sample data set, tools, news….

Using SMS to handle TIGGE flow

Quality assurance: homogeneity • Homogeneity is paramount for TIGGE to succeed • The more consistent the archive the easier it will be to develop applications • There are three aspects to homogeneity: • Common terminology (parameters names, file names,…) • Common data format (format, units, …) • Definition of an agreed list of products (Parameters, Steps, levels, …) • What is not homogeneous: • Resolution • Base time (although most provider have a run a 12 UTC) • Forecast length • Number of ensemble

E.g. Cloud-cover: instantaneous or six hourly? QA: Checking for homogeneity

QA: Completeness • The objective is to have 100% complete datasets at the Archive Centres • Completeness may not be achieved for two reasons: • The transfer of the data to the Archive Centre fails • Operational activities at a data provider are interrupted and back filling past runs is impractical • Incomplete datasets are often very difficult to use • Most of the current tools (e.g. epsgrams) used for ensemble forecasts assume a fixed number of members from day to day • These tools will have to be adapted

QA: Checking completeness

GRIB to NetCDF Conversion GRIB File NetCDF File Metadata Gather metadata and message locations t, EGRR, 1 d, ECMF, 2 t (1,2,3,4) t, ECMF, 2 Create NetCDF file structure t, EGRR, 2 d, EGRR, 1 Populate NetCDF parameter arrays (1,2,3,4) represents ensemble member id (Realization) d (1,2,3,4) t, ECMF, 1 d, EGRR, 2 d, ECMF, 1

Ensemble NetCDF File Structure • NetCDF File format • Based on available CF conventions • File organization built according to Doblas-Reyes (ENSEMBLES project) proposed NetCDF file structure • Provides grid/ensemble specific metadata for each member • Data Provider • Forecast type (perturbed, control, deterministic) • Allows for multiple combinations of initialization times and forecast periods within one file. • Pairs of initialization and forecast step

Ensemble NetCDF File Structure • NetCDF Parameter structure (5 dimensions): • Reftime • Realization (Ensemble member id) • Level • Latitude • Longitude • “Coordinate” variables are use to describe: • Realization • Provides metadata associated with each ensemble grid. • Reftime • Allows for multiple initialization times and forecast periods to be contained within one file

Tool Performance • GRIB-2 Simple Packing to NetCDF 32 BIT • GRIB-2 size x ~2 • GRIB-2 Simple Packing to NetCDF 16 BIT • Similar size • GRIB-2 JPEG 2000 to NetCDF 32 BIT • GRIB-2 size x ~8 • GRIB-2 JPEG 2000 to NetCDF 16 BIT • GRIB-2 size x ~4 • Issue: packing of 4D fields (e.g. 2D + levels + time steps) • Packing in NetCDF similar to simple packing in GRIB2: • Value = scale_factor * packed_value+ add_offset; • All dimensions shares the same scale_factor and add_offset • For 16 bits, only different 65536 values can be encoded. This is a problem if there is a lot of variation in the 4D matrices

GRIB2 • WMO Standard • Fine control on numerical accuracy of grid values • Good compression (Lossless JPEG) • GRIB is a record format • Many GRIBs can be written in a single file • GRIB Edition 2 is template based • It can easily be extended

NetCDF • Work on the converter gave us a good understanding of both formats • NetCDF is a file format • Merging/splitting NetCDF files is non-trivial • Need to agree on a convention (CF) • Only lat/long and reduced grid (?) so far. Work in progress for adding other grids to the CF • There is no way to support multiple grids in the same file • Choose a convention for multi fields per NetCDF files • All levels? All variables? All time steps? • Simple packing possible, but only a convention • 2 to 8 times larger than GRIB2

Conclusion • True interoperability • Data format, Units • Clear definition of the parameters (semantics) • Common tools are required (only guarantee of true interoperability) • Strong governance is needed • GRIB2 vs NetCDF • Different usage patterns • NetCDF: file based, little compression, need to agree on a convention • GRIB2: record based, easier to manage large volumes, WMO Standard

TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time

TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time

Presentation Transcript

Exchanging Imaging Data

RESEARCH WITH REAL-TIME MACROECONOMIC DATA

Real Time High-Resolution NWP @ McGill University

Spark Streaming Large-scale near-real-time stream processing

REAL-TIME and NEAR REAL-TIME DATA SOURCE

AFS Near Real Time Mirrors with Unison

Spark Streaming Large -scale near-real-time stream processing

Near-real-time Backup of Large Seismic Waveform Datasets with the Storage Resource Broker

Chapter 6: Accessing large amount of data

Address Resolution for Massive amount of hosts in large Data Center

Real time data in Statoil

Customization of TIGGE: Use of TIGGE data in West Africa

Real Life Experience with Real Time Problem Solving

Exchanging Simple Data

Exchanging experience in Gerontechnology

Near Real-Time Ocean Data Management

Exchanging Imaging Data

Implementing Near Real-Time Data Warehouse

The TIGGE experience

Exchanging Imaging Data

Exchanging Imaging Data

Exchanging Imaging Data