80 likes | 195 Views
Best Practices in Ingestion and Data Access at the NASA/IPAC Infrared Science Archive http://irsa.ipac.caltech.edu/ G. Bruce Berriman (IPAC, Caltech). The NASA/IPAC Infrared Science Archive. The archive node for NASA’s infrared astronomy data sets Housed at the Infrared Processing
E N D
Best Practices in Ingestion and Data Access at the NASA/IPAC Infrared Science Archivehttp://irsa.ipac.caltech.edu/G. Bruce Berriman(IPAC, Caltech) Science Archives in the 21st Century
The NASA/IPAC Infrared Science Archive • The archive node for NASA’s infrared astronomy data sets • Housed at the Infrared Processing Analysis Center (IPAC) in Pasadena • Multi-wavelength archive curates data from IRAS, 2MASS, SWAS, MSX, IRTS, Spitzer Legacy projects • 200 source catalogs, 10 million images, 30,000 spectra • Interoperable with Spitzer archive, ISO, NED, VizieR Science Archives in the 21st Century
Best Practices in Data Standards & Ingestion • Data standards designed to enable astronomers to use the data • Source catalogs - attributes of all columns must be fully specified • Preferably delivered in column delimited ASCII format • Images must comply with the FITS standard and include WCS footprint • Spectra must comply with FITS standard or be delivered as a table • Include slit center coordinate and position angle • Data should be self-describing & include provenance Science Archives in the 21st Century
IRSA Best Practices • Become a resource for data providers • One Archive provides data management support for an active mission • IRSA provided this function for 2MASS; will provide for WISE • Data products already incorporated into archive infrastructure • Work day to day with processing and science team - problems are inexpensive to solve • Two Archive staff as members of science or data processing teams Science Archives in the 21st Century
IRSA Best Practices • Three Schedule and budget pressure complicate delivery of well structured data sets • Reprocessing is expensive • E.g. Spitzer Legacy team re-deliveries cost IRSA $200K over past two years • Encourage early delivery of sample products • IRSA provides on-line and downloadable tools that are aids in QA (http://irsa.ipac.caltech.edu/irsa-dataQA.html) • Tools developed in response to common problems. Examples • Document attributes of a source catalog • Validate structure and format of a source table • Validate syntax, WCS information and astrometry of image Science Archives in the 21st Century
Image Validation Tool FITS keywords comply with FITS syntax (fverify) Edit FITS headers WCS information is complete Provides simple check of positional accuracy - overlay positions of 2MASS sources Control image display Science Archives in the 21st Century
Best Practice in Data Access • Best practice: use a common software architecture • All IRSA services are integrated into the Infrared Science Information System • Component based architecture • Modules are stand-alone, portable ANSI C tools that are plugged together • Supports extensive software re-use • Controls maintenance costs Anatomy of User Application • Application is usually a CGI program • Components plugged together & controlled by an executive library • Executive starts components as child services & parses return values Science Archives in the 21st Century
Archive Software Infrastructure - Benefits to Data Providers • Efficient deployment of new end-user services • IRSA has used this infrastructure to build archives for customers • Michelson Science Center • W. M. Keck Observatory Archive • Transit Data Set archives • Interferometry archives (KI, PTI) • Cosmic Evolution Survey (COSMOS) • NASA Stellar and Exoplanet Database (NStED) • Estimated savings in MSC, COSMOS and NStED introduced by re-use is $3M. Science Archives in the 21st Century