300 likes | 419 Views
ICOADS Archive Practices at NCAR. JCOMM ETMC-III 9-12 February 2010 Steven Worley. Topics. Environment setting Data management tools and principles ICOADS NCAR Release 2.5 contributions Background Collections Future Challenges. Environment Setting.
E N D
ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley
Topics • Environment setting • Data management tools and principles • ICOADS NCAR Release 2.5 contributions • Background Collections • Future Challenges
Environment Setting • ICOADS is part of a larger collection called the Research Data Archive (RDA) • RDA – briefly • 600+ datasets (atmosphere, ocean, geosciences) • 4.3M files, 462 TB (primary data) • 6000+ unique users annually, including ICOADS • Staff, 7 scientific programmers (M.S. degrees), me, and administrative assistant
Data management principles • Always archive 2 copies of observational data • 3rd copy at a partner center (disaster recovery) • Free and open data access world-wide • Internet • Past – other media, cd-roms, tapes, etc. • Share what we have to build archives • E.g. Digitization of Maury data in China in exchange for global land surface data
Data Management Tools • New System: Common RDA tools that homogenize data management. • Efficient • Scalable • Old System: Specialized Software to manage each data input. • Inefficient • Difficult to Scale RDA Metadata Database GCMD Metadata Server NWP Server RDA Data Server Online Disk Specialized Software Package 1 RDA Data Management Common Tool Set University Server Specialized Software Package 2 Tape Storage Specialized Software Package 3 Unidata Server
Data Management tools – a few details • Common scripting structure to do routine dataset updates (dsupdt) • Very tunable • Frequency, multiple server priority list, validation • Fully integrated with RDADB • Users view is automatically update and therefore always current • Common single archiving function (dsarch) • location and copy control (MSS/HPSS storage, and online disk) • Fills all DB entries (e.g. file and dataset relationships)
Data management tools • Harvest file level metadata (gatherxml) • Handle various formats (GRIB1, GRIB2, netCDF, BUFR, IMMA, ON29, etc.) • Save as <xml> and populate DB • Benefits • Problem detection • Versioning, replacement, extension • Inventory information • Drive better data service for users
Data management tools • Provide access to data in tape storage archive (dsrqst) • Relatively new, not universally available across RDA - yet • Delayed mode – with DB control (many details) • Why – RDA holds 462 TB • 40 TB online – most popular small scale products • Access to more products for greater community
ICOADS Release 2.5 contributions @ NCAR • Data Preparation – format evaluations, translate native formats to IMMA format • Moored research buoy delayed mode archives • TOA, PIRATA (PMEL, JAMSTEC) • World Ocean Database 2005 • Multiple ocean profile types (NODC) • Receive/archive ICOADS data processing results • NOAA/ESRL does processing - source merging, duplicate elimination, preconditioning deletion and fixes, etc.
ICOADS Release 2.5 contributions @ NCAR • Create and maintain user data access interfaces • File access • IMMA and binary (observations, monthly summary statistics) • Sub-selection (time, space, parameter) • Example coming. • Output is ASCII tabular format • Runs automatically – nearly all requests completed in 10 minutes • Keep user metrics
ICOADS Release 2.5 contributions @ NCAR • Near-term preliminary extensions to R2.5 • Beginning with data in 2008 and forward • Based on NCEP GTS compilation/merge • Runs on day 2 of each month – processes previous month. • Create IMMA observations and binary monthly summary statistics • Harvest file level metadata • Do all archiving of original and processed files • Automatically, update user interfaces
Who uses the sub-setting interfaces?2005-2009 58 Countries
Background Collections • Historical • Most complete set of ALL source data used to create ALL ICOADS Releases • Beginning in mid-1980s • Copies of ALL ICOADS Releases • We do not delete any files
Background Collections • Ongoing / Routine data receipts • Format conversions are done at NCDC
Future Challenges • Eliminate user interface dependency on java applets – deploy java script instead. • Support “advanced” ICOADS initiative • Bias adjusted / corrected observations • Serve as a central DB / handle data ingest • Build a user interface • Continue as a full U.S. partner.