250 likes | 341 Views
Scientific Investigations; Support from Research Data Archives for Computing in Atmospheric Sciences 2001. 29 October, 2001 Steven Worley National Center for Atmospheric Research Scientific Computing Division. Key Steps of Scientific Investigations.
E N D
Scientific Investigations; Support from Research Data Archivesfor Computing in Atmospheric Sciences 2001 29 October, 2001 Steven Worley National Center for Atmospheric Research Scientific Computing Division
Key Steps of Scientific Investigations • Formulate the questions and review the state of understanding • Search and discover data • Access data • Analyzes data • Community sharing and archive • Document new understandings
Search and Discover Data • How? Web based Information Server • Salient Features • 2.5K + html pages (metadata) • All datasets are described (500+) • Location of all data files in MSS • Higher level information • Catalogs • Project specific descriptions Always current dataset descriptions
Features • Organization Navigation • Archive Navigation • Pull down menus • Search • Project Links
Dataset Page • Title and Brief description • Systematic Navigation • Metadata highlights • Period of Record • Usage • Variables • Related Sites (NOAA) • Contact Person • Related Datasets
Brief Archive History and Specifications • Started in middle 1960’s, (35 years) • Managed by nine people • 211K data files • 17 TB in a MSS • 530 datasets – all sizes
Global Observations • Usages: • Input for global atmospheric reanalysis • Basic long term climate assessment and case studies
Operational and Composite Analyses • Daily SLP is a small but very popular dataset, e.g. NAO evaluations • Two main operational centers provide the best current analyses
Concerns; • Restricted distribution • U.S. non-profits and UCAR members only • Need online authentication and authorization for easy access • Key Aspects • Medium size archive – 170 Gigabytes • multi-(product, temporal res., spatial res.) - complex
Highlights • Frequent updates to FNL, 1º, daily via FTP • High resolution N. America product, ETA at 40km • No distribution restrictions or cost
Reanalyses • Notes: • ERA-15 is finished, ERA-40 is running now • NCEP II, primarily experimental run
Outstanding Features • Three different coordinate surfaces • Very long analysis, 2+ Terabytes size • Unrestricted distribution • CD-ROMS are very popular
Countries Receiving Reanalysis CDROMs • Highlights • Over 8900 CDROMs 1997-09/2001 • Recipients; U.S. 46%, Japan 11%, (Canada, UK) 4%, (Germany, India) 3%, (Australia, S.Korea, Spain, Mexico, Norway, Russia, France) 2%
Reanalysis Users for 2001 (4th qtr estimated) 209 From the MSS [157 Jan.-Sep.] 47 On CDROM [35] 48 Custom data orders on FTP or Tape [36] 540 From the online server [406] 844 Total Served
Reanalysis Data Distributed for 2001 (4th qtr estimated) • 9616 GB from the MSS [7230 GB Jan.-Sep.] • 808 GB On CD-ROM [935, @650Mb/CDROM] • 1383 GB Custom orders, FTP and tape [1040] • 88 GB From the online server [66 GB] • 11895 GB, 11.9 TB Total
High resolution atmospheric models focused on energy and hydrology cycles. GCIP Model Data Center Collection • Critical data for N. American mesoscale studies • Complete archive is about 1 Terabyte GCIP: GEWEX Continental-Scale International Project / GEWEX : Global Energy and Water Cycle Exper.
6-yr Mean T at 5 meters University of Miami Ocean Model Data MICOM; Miami Isopynic Coordinate Ocean Model, 1/12th degree 70N to 28 S, 16-20 layers
Dataset Sizes and Scales • Today • ~ 800 Unique users • ~ 12 Terabytes data transferred • 2 Terabyte dataset size • Example: NCEP/NCAR Reanalysis • Near Future Excludes TB-PB Level 0 and 1 satellite and the super scale experimental models • Numbers of Users, ~ same • Data transferred, 5x to 10x more ? • Dataset size, 2-20 TB • Examples: • Ocean and Atmosphere models • ECMWF Reanalysis (ERA40)
Access to Data Methods • NCAR computers • From the local MSS • Web data server • Custom data packages – by request (FTP, tape, CDROM) Users • World class programmer • Research Scientist • Graduate Students • Undergraduate Students
Data Access in the future • Do we continue doing what we are doing? “Absolutely” Why? It Works • Over 1000 users annually • Very diverse skills • The archive is a heterogeneous collection • Many formats (ASCII, Binary, GrIB, BUFR, netCDF, HDF) • Many sizes (1 MB to 2 TB) • Capable of serving large and small projects Maintain a variety of flexible methods
Data Access in the future • Keys to handling future larger collections • Plan to create useful data products • Condensed datasets from high resolution output • Group most popular variables products together • Serve many, e.g. CDROMS and WWW • Continue to develop emerging online data systems • User driven subset selection with graphics and data download options • Server-side elementary analysis • Multi-dataset comparisons • Statistical summaries and basic meteorological calculations • Our development is the “Community Data Portal”
Data Analysis • Tools • NCAR Command Language (NCL) software • Features in brief • I/O for many ‘standard’ data formats • Easy adaptations to read any format • 100’s meteorological functions • “Publication quality” graphics • The CDP is capable of analysis • NCL is one of several middleware packages
Community Sharing • Support for the scientist • A place to distribute new data results • Possibly with authentication and authorization control • E.g. model outputs • Spin off benefit • New data resources for the archive • Many users can then use new product
a b • NCEP Operational Analyses blended with QSCAT Satellite data • Wind Stress Curl, 01/24/2000 1800 UTC • NCEP Operational ONLY • NCEP + QSCAT swaths • OI blend of NCEP + QSCAT • Blending by Colorado Research Associates • We archive all three products. c
Key Steps of Scientific Investigations • Formulate the questions and review the state of understanding • Search and discover data • Access data • Analyzes data • Community sharing and archive • Document new understandings