160 likes | 298 Views
Gridded Data Sub-setting Services through the RDA at NCAR. Doug Schuster, Steve Worley, Bob Dattore , Dave Stepaniak. Gridded Data Sub-setting Services Through the RDA at NCAR. Research Data Archive (RDA) Overview Problem Background Required Infrastructure Current Services
E N D
Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak
Gridded Data Sub-setting Services Through the RDA at NCAR • Research Data Archive (RDA) Overview • Problem Background • Required Infrastructure • Current Services • Future Directions
RDA Overview • Total archive volume over 1.3 PB • 8000+ unique users annually • Operational and Reanalysis model outputs Meteorological and Oceanographic Observations Remote Sensing Observations • Topography/Bathymetry, Vegetation, Land Use
Problem Background Data Volume
Problem Background • Large computational/storage resources needed • Store data • Extract desired data from large grids/files • Convert data to desirable format(s) Scientific data centers have these resources Individual researchers generally don’t
Problem Background • Goals • Make data more accessible and easier to use for individual researchers • Reasonable access volumes • Desired data formats • User defined parameters/grids • Researchers stay focused on research
Required Infrastructure Command Line Interface Web Interface Powerful Computing NCAR HPC/DAV Large Disk Storage (500 TB) Generalized Software Tools -Control system (RDAMS) -Sub-setting -Format conversion Rich and Detailed Metadata Databases (RDADB)
Required Infrastructure • Rich Metadata Databases (key ingredient) Metadata DB Support Efficient Backend Processing Provide Scalability Drive Interfaces File attribute metadata: Name, Dataset, Location, Format File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L)
Current Services • Sub-setting available on 13 datasets • ERA-I, CFSR, Operational Model, EaSM • Also available on select observation sets • Sub-setting options • Parameter selection • Spatial region selection (limited availability) • Available output formats • Native GRIB formats • NetCDF format
Current Services • Sub-set requests • Processed in delayed mode • User notified by email when request is ready • Download data via server provided wget scripts
Future Directions • Spatial Interpolation • Faster Request Processing (NWSC) • Include More RDA Datasets • Improved Access Portals • Additional Output Formats • Web Service Access
Summary • Data Analysis Research Challenges • Large and Growing Data Volumes • Numerous Formats • RDA – Supply “User Friendly”Data • Parameter and Spatial Sub-Setting • Format Conversion • Improved and Additional Services http://dss.ucar.edu schuster@ucar.edu