100 likes | 119 Views
Access 500+ diverse datasets totaling 12 TB to cater to 900 researchers annually. Explore NCEP/DOE AMIP-II Reanalysis for powerful analysis tools. Enhance user experience and stay tuned for service expansions and restrictions.
E N D
SCD Research Data Archives; Availability Through the CDP • About 500 distinct datasets, 12 TB • Diverse in type, size, and format • Serving 900 different investigators per year
Enhanced Service through the CDP • What data is best for the CDP? • Datasets that are needed by the largest group of scientists. • Datasets which are typically large (10’s of Gigabytes) and from which spatial, temporal, and parameter subsets are normally preferred. • Other relevant datasets that are often required to support research using the datasets defined above. Global Atmospheric Reanalyses
CDP project, NCEP Reanalysis-2 • About Reanalysis-2 • Proper full name: NCEP/DOE AMIP-II Reanalysis • Experimental follow-on to the popular NCEP/NCAR Global Atmospheric Reanalysis • For the CDP we have chosen one popular product • “Pressure stack”, global 2.5°, 7 variables on 17 pressure levels, 4x daily, and a few surface only grids. • There are other products, e.g. surface flux fields, climatologies • Using a one year sample for CDP study • 1460 file, 2.2 Gbytes • We have data for 1979-1999, continuing. • Total pressure stack data is 45 Gbytes, and growing Data provided by M. Kanamitsu, NCEP
Successes and outlook • It works, we can do it! • Access based on LAS, NCL (NCAR Command Language), and a local file system. • The important key was NCL • NCL can read many file formats (netCDF, GrIB, HDF) • The native format produced at the weather centers (NCEP and ECMWF) is GrIB, a WMO standard.
Outlook • NCL can do much more! • It is a powerful analysis tool • 50+ computational math functions • 10+ routines for scalar and vector regridding • Many atmospheric model specific function – Spherepack etc • We control the development of NCL – important functionality can be added • Through NCL we could offer more analysis capability as part of the CDP
Outlook • Challenges • How can we sensibly scale this system up to handle 100 Gigabyte datasets and multiple users? • A certainty. Users will request large subsets and some will be orthogonal to whatever file structure is chosen • Result. Long computational run times, and large output data files • The requester may not know this in advance • This type of unexpected result => dissatisfactory service
Outlook • Enhancements to avoid unexpected results • Construct algorithms to estimate the run time and output data volume. • For large output files or long running requests • offer delayed service through standard FTP procedures • E.g write the data to an FTP server and notify the user when it is ready. • Some requests will be too large for convenient FTP transfer. • In this case the requester should be referred to the SCD/DSS staff for assistance.
Outlook • Need to enhance the interface to insure complete metadata access • A wealth of critical metadata • Model descriptions • Input data sources • Publications • Associated studies and derived datasets • Many related URL’s • Clear links throughout the CDP so users can find the metadata and get assistance, e.g. SCD/DSS information server. • Need mechanisms to get user feedback
Outlook • May need restriction and authentication procedures for some datasets • Redistribution of some data is restricted, e.g. ECMWF analyses. • With simple registration we are able to provide these data to UCAR members in North America. • All others are excluded.
Wrap-up • We have encouraging results so far and will continue the development • Measure of success – User satisfaction! • Public availability at the CDP will be announced on the SCD URL – scd.ucar.edu • Reanalysis-2 is available now from the MSS or through the SCD/DSS, see dss.ucar.edu/datasets/ds091.0 • Details about the model runs are at: wesley.wwb.noaa.gov/reanalysis2