320 likes | 394 Views
New Resources in the Research Data Archive. Doug Schuster. Topic Outline. New Resources Search/Discovery and Data Delivery TIGGE JRA-25 Routine Updates. Data Search, Discovery and Delivery. Popular Datasets Google Style Search Drill Down Style Search File Level Metadata Example:
E N D
New Resources in the Research Data Archive Doug Schuster
Topic Outline • New Resources • Search/Discovery and Data Delivery • TIGGE • JRA-25 • Routine Updates
Data Search, Discovery and Delivery • Popular Datasets • Google Style Search • Drill Down Style Search • File Level Metadata • Example: • Search for model generated tropical cyclone track data using “Drill Down” method.
Background on TIGGE WMO World Weather Research Programme THORPEX • THeObserving system Research and Predictability EXperiment • THORPEX Interactive Global Grand Ensemble (TIGGE) Archive supports research • Grand Ensemble = multiple NWP centers ensembles are combined (an ensemble of ensembles) • 10 international NWP Centers contributing to TIGGE
Background on TIGGE Three mirrored archive centers • NCAR • ECMWF • CMA {Shared System Development!} • Daily Data Flow Metrics • 245 GB • 1.6 Million gridded fields as separate data packets • 3000+ Files/day
Data Receipt UKMO CMC CMA ECMWF MeteoFrance NCAR NCEP JMA KMA NCDC IDD/LDM HTTP FTP Archive Centre CPTEC Current Data Provider BoM Unidata IDD/LDM Internet Data Distribution / Local Data Manager Commodity internet application to send and receive data
Archive Summary • Online Data • Period, most recent two weeks • ~ 4 TB , public products • ~ 2 TB, data preparation, subsetting, DB • Offline Data • Full period of record • ~ 200 TB, NCAR MSS system
Major ChallengesInsure data receipt, build complete archive • Exchange manifest files as part of IDD/LDM data transmission between Archive centers • Verify send, receive • Automated resend requests for missing fields • Collate data fields into different files types • Harvest and hold metadata in MySQL DB’s • Identify location of every field in file set • Updated often • Critical for users interface and background data processing
Major Challenges • Access system must accurately display what common parameters are available as users make selections • Driven by multi-center research (Grand Ensemble) • Parameters vary between centers.
Get Forecast Data Two User Interfaces • NCAR online file archive • Selection options (Portal or RDA) • Center(s) • Date • File type (sl, pl, etc) • Initialization time • Forecast length • User customized files • Selection options (Portal) • Same as for files, plus • Parameter Subsets • Grid Interpolation • Spatial subsets • Formats,GRIB2,NetCDF Real Time Delayed Mode • Download Options • Point and click using browser, one file at a time • Script to run on local machine • User and password encrypted ‘wget’ commands • background process to access all files
User access selection demonstration Animation, what you will see • Multiple centers • (ECMWF, UKMO, NCEP, CMA, CMC, KMA) • Fields/Parameters • (Geopotential Height, 2m Temperature) • Levels • (500 hPa, Single Level) • Spatial and temporal ranges • (Global, 3-days, 12Z initializations, 48 hour forecasts) • Regridding to common spatial resolution • (1.5°) • Output format • (netCDF)
Features of JRA-25/JCDAS at NCAR All data available through web/RDA portal and NCAR MSS, 11 TB • Available dates, 1979 though 2007 • 23 different data products • 4 x daily, GRIB1 format • Monthly mean, netCDF (NCAR derived from binary) format • All data users are registered and must agree to JMA’s ‘Condition of Use’
Typhoon Sepat, 16 August 2007 Images courtesy Dave Stepaniak
Routine Updates • NCEP • FNL Global Tropospheric Analysis (Daily) • BUFR/PREPBUFR obs. data (Weekly) • Unidata IDD data (Daily) • NetCDF format obs collected from GTS • IDD model data (GRIB-2) • GFS • NAM • RUC
Routine Updates • SST • NCEP OI Global SST 1x1 Deg (weekly) • NOAA OI Global 0.25 x 0.25 SST (monthly) • Hadley Centre Global Sea Ice and SST (monthly) • Reanalysis • NNR Yearly updates • NARR Yearly updates • JRA-25
Lessons Learned • Manifest files and automated resend are critical for a complete archive • The impact of different contributions from the NWP centers across archive cannot be under estimated • There are important design considerations to insure prompt browser interactions • Caching data from the DB
Lessons Learned • Computational resource requirements ramp up quickly with multi-dimensional problems • D’s, center, ensemble member, parameter, forecast length, etc. • Archive file structure choices greatly impact subsetting ability • TIGGE currently based on synoptic order • Time-series by parameter could be better?
Major Challenges • Limited online storage – 4 TB, ≅ 2 weeks temporal coverage • Full archive on NCAR Mass Storage System • User registration and metrics required • Accept data policy; for research and education only • 48 hour delay from forecast initialization time