210 likes | 327 Views
The Research Data Archive at NCAR. Doug Schuster and Steve Worley NCAR. Topic Outline. Introduction/History Core Data Categories/Featured Datasets Archive Management/Tools New Supporting IT Infrastructure Future Possibilities. Introduction/History. Data Support Section (Founded 1965)
E N D
The Research Data Archive at NCAR Doug Schuster and Steve Worley NCAR
Topic Outline • Introduction/History • Core Data Categories/Featured Datasets • Archive Management/Tools • New Supporting IT Infrastructure • Future Possibilities AMS 2011
Introduction/History • Data Support Section (Founded 1965) • Paper -> Punch Cards -> Tapes -> CD/DVD’s ->Hard Drives -> Network Based Storage and Transfer • KB of observations -> Terabytes of Model Generated Data (Total archive volume over 600 TB) • Weeks or months for a user to get data -> Users want data access now (over 7000 registered users) • Pay for Data -> Free and open access to all datasets that aren’t subject to source restrictions AMS 2011
Introduction/History • How do we evolve to support the growing needs of data users and generators? • Stay aware of current research uses • Strengthen datasets supporting core research data categories • Update archive management tools • Rebuild/Augment IT infrastructure • Educate supporting staff AMS 2011
Core Data Categories • Content to support atmospheric and geosciences research • Some research examples: • Climate • Oceanographic • Hydrologic • Weather Prediction • Renewable Energy (Wind/Solar) AMS 2011
Core Data Categories • Operational and Reanalysis model outputs Meteorological and Oceanographic Observations Remote Sensing Observations • Topography/Bathymetry, Vegetation, Land Use AMS 2011
Featured Datasets 1662 Global Platform Observations 2011 AMS 2011
Featured Datasets 1850 Analysis and Forecast Model Data 2011 AMS 2011
Featured Datasets 1870 High Resolution Re-Analysis 2011 AMS 2011
Archive Management How can we support an archive that continuously grows in volume and complexity with a fixed number of supporting staff? AMS 2011
Archive Management • Common Data Management Tools • Functionality Requirements • Scalable • Integrated –one call does all • Automatable AMS 2011
Archive Management • Common Data Management Tools • Task Completion Requirements • Data acquisition • Get Data (daily or irregularly) • Data Archival • Archive to disk and tape • Metadata Collection • Collect Metadata • Update Metadata Databases • Metadata Publishing • Update Web Server Pages • Update Internal Metadata Access Points AMS 2011
Step 1: Get Data Integrated Archival Tools Model Generated Data GRIB, NetCDF Automated dsupdt RDA/CISL Servers Obs Data BUFR, ASCII etc. Remote Sensing Data Binary Manual Tape, FTP, etc Topography Vector Image, Binary, etc AMS 2011
Step 2: Archive Data Integrated Archival Tools RDA/CISL Servers Model Generated Data GRIB, NetCDF RDA Database Model Generated Data Files GRIB-2 HPSS File attribute metadata: Name, Dataset, Location, Format Model Generated Data File Obs Data BUFR, ASCII etc. dsarch Remote Sensing Data Binary DISK Topography Vector Image, Binary, etc Model Generated Data File AMS 2011
Step 3: Collect File Content Metadata/Check Integrity Integrated Archival Tools RDA/CISL Servers Model Generated File, GRIB-2 Format Temperature (Center, Date, Time, Level, Location) RDA DB Humidity (Center, Date, Time, Level, Location) File attribute metadata: Name, Dataset, Location, Format Gather Meta data File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) Vorticity (Center, Date, Time, Level, Location) Visibility (Center, Date, Time, Level, Location) Precip Rate (Center, Date, Time, Level, Location) AMS 2011
Step 4: Publish Metadata and Data Integrated Archival Tools RDA/CISL Servers RDA Web Server RDA DB -Dynamic File lists -Data Search tools -Detailed Content Metadata -Data Subsetting Interfaces File attribute metadata: Name, Dataset, Location, Format File content metadata: T(C,D,T,L,L) RH(C,D,T,L,L) Vort(C,D,T,L,L) Vis(C,D,T,L,L) PcpR(C,D,T,L,L) CISL Computational Node -Detailed Metadata for files on disk. -Data Subsetting AMS 2011
New Supporting IT/Infrastructure • Online Disk Upgrades • Larger Disk (450 TB) • Common Disk Interfaces (webserver and compute nodes) • Tape Archive Upgrades • High Performance Storage System (HPSS) • Computing Power Upgrades • Additional and more powerful servers AMS 2011
New Supporting IT/Infrastructure NCAR User Community Pros: -Access to full RDA. -Fast computing. Complete User Community Pros: -Fast access to online data. -Access to all RDA metadata. -Access to RDA data. processing services. NCAR User Community Cons: -No access to online data. -Forced to use MSS as a file server: access is too slow -No direct access to RDA metadata. Complete User Community Cons: -Small fraction of RDA online. -Slow access to offline data. -Data processing requests take a long time to finish. AMS 2011
New Supporting IT/Infrastructure Complete User Community Improvements: -Faster access to full RDA. -Expanded data processing services available. -Faster turnaround on data processing requests. NCAR User Community Improvements: -Faster access to full RDA. -Direct access to all RDA metadata. AMS 2011
Future Possibilities • Leverage New IT Infrastructure • Server side parameter and spatial sub-setting across multiple datasets • Model or In-Situ observations • Data provided in multiple output formats • Web services based requests (REST, etc.) • Addition of large and diverse data sets to the RDA. AMS 2011
http://dss.ucar.edu AMS 2011