Data Processing and Archive

Data Processing and Archive Data Systems Team www.stsci.edu/software/OPUS/ www.dpt.stsci.edu/ www.at.stsci.edu/

General Requirements for OPUS Science Data Processing • OPUS will develop the WFC3 pipeline based on the existing pipeline model for ACS and NICMOS including OTFR. • Level 0 data is packetized and Reed-Solomon corrected by PACOR at GSFC • Receive science telemetry (level 1a data) at STScI as science instrument specific ‘pod files’ • Engineering snapshot included • No on-board compression for WFC3 data • STScI processing on Compaq ALPHA/Tru64 UNIX platform

General OPUS Requirements for Science Data Processing (cont.) • OPUS must account for all scheduled exposures. • Convert telemetry to FITS format • Structure tables or data array • Populate header keywords • Keywords to provide metadata for archive catalog • Associate groups of exposures that must process further as a single unit • Execute calibration tasks in pipeline mode • Pass level 1b science data (pod files and uncalibrated science datasets) and jitter files to Hubble Data Archive

OPUS Requirements for Thermal Vac • OPUS will develop a WFC3 science data processing pipeline capable of supporting Thermal Vac testing • No support schedule available in PMDB format • Necessary support schedule information to be read in from ASCII file • No associations • No calibration • Data will be archived to HDA • Processing on Sun / Solaris UNIX platform

OPUS Requirements for Thermal Vac (cont.) Current Thermal Vac OPUS pipeline delivery schedule • Earliest possible WFC3 Thermal Vac currently scheduled for October 7, 2002. • OPUS Thermal Vac pipeline due about two months prior to Thermal Vac, August 5, 2002. • Beta version of OPUS pipeline due about five months prior to Thermal Vac, May 6, 2002.

OPUS science data processing pipeline for WFC3

OPUS Processes • Data Partitioning • segments the telemetry stream into standard EDT dataset • fill data inserted if telemetry drop-outs exist • constructs a data quality image to ensure the subsequent science processing does not interpret fill data as valid science data • Support Schedule • gathers proposal information from PMDB • test proposals required for development • test version of PMDB must be populated by TRANS • Thermal Vac support schedule to be input from ASCII file

OPUS Processes (cont.) • Data Validation • decodes the exposure and engineering parameters in the telemetry and compares them to the planned values • internal header specification (from Ball) • PDB (EUDL.DAT, TDFD.DAT) must be fully populated and defined in DM-06 • flags and indicators need to be verified by the Instrument Scientists, but will likely be the same as ACS for UVIS channel and NICMOS for IR channel • World Coordinate System • implements a translation from telescope coordinates through the instrument light-path to an astronomically valid pointing • aperture positions must be defined

OPUS Processes (cont.) • Generic Conversion • Generic Conversion outputs uncalibrated data • data will be output in standard FITS format with image or table extensions • primary header will contain keywords inherited by all extensions and a null data array • Separate headers and data formats for UVIS and IR channel data • UVIS channel keywords based on ACS • IR channel keywords based on NICMOS

OPUS Processes (cont.) • Generic Conversion (cont.) • Data formats • UVIS channel follows ACS data format • each file contains two imsets, one for each chip • imset contains science, error, and data quality array • IR channel follows NICMOS data format • each file contains an imset for each readout • imset contains science, error, data quality, data samples, and effective integration time array • data quality array will be null if no telemetry dropouts • calibration generates full data quality array with all other DQ flags

OPUS Processes (cont.) • Generic Conversion (cont.) • Required for development • DM-06 to develop algorithms for data formatting • keyword definitions (ICD-19) must be provided by the Instrument Scientists • world coordinate definitions • exposure time calculations • calibration switches and selection criteria • calibration file name keywords

keyword name default value possible values units datatype short comment for header long description header position DADS table keyword source Keyword specification • The following information must be provided by STScI Science Instrument team for all WFC3 specific keywords using a standard form for keyword database input.

OPUS Processes (cont.) • Data Collector • OPUS will ensure all necessary component exposures are present before processing further • association table contains all information about product dataset • dataset self-documenting • only associations required for data processing with be constructed in the OPUS pipeline • association created from a single proposal logsheet line • Error condition actions to be defined by Instrument Scientists • rules for processing incomplete associations • association time-out rules

Calibration • OPUS will use STSDAS calibration software • run on ALPHA / Tru64 UNIX platform in operations • expands size of dataset • converts integer raw data to real • population of error and data quality array • Need calibration reference files for testing (at least dummies)

Other Science Data Modes • requirements for data content of each of these other science data modes must be defined by Instrument Scientists • microprocessor memory dump • engineering diagnostic data • No target acquisition observations for WFC3

Engineering data processing • Receive engineering telemetry data from CCS at GSFC • Process Engineering data through FGS Data Pipeline • Generate data products to characterize jitter and pointing control information in support of science observations • WFC3 jitter file association packaging will mimic science data associations • No other WFC3 specific requirements

OPUS / Archive Interface • OPUS will present to the archive: • Original data received from PACOR (binary pod files) • Pod file data packaged by observation in FITS format • Output of Generic Conversion (uncalibrated science dataset) in FITS format • Output of STSDAS calibration (calibrated science dataset) in FITS format • Jitter files from the engineering telemetry in FITS format • Data from other science modes (target acquisition, memory dump, engineering diagnostic data) in FITS format

Archive Ingest – Catalog Population • Archive catalog populated from FITS calibrated and uncalibrated science datasets • Header keyword values used to populate archive catalog database fields • Since keywords based on ACS and NICMOS design, archive catalog tables will also correspond to ACS and NICMOS design • engineering snapshot keywords from spt file used to populate instrument tables for trending analyses • Associations will also follow ACS/NICMOS design • Current database schema will work for WFC3 associations

Archive Ingest – Data Store • Both binary and FITS versions of pod files written to archive media • FITS pod files planned to be input for OTFR • Currently Generic Conversion output (uncalibrated FITS dataset) written to archive media • May cease to write this dataset to archive media if FITS pod files and OTFR prove to be sufficient for archive • gzip file compression performed on all files prior to writing to archive media

StarView 6

OTFR • In OTFR, data retrieved from the archive are reprocessed from the pod file • Provides HST data user with optimal product at time of retrieval • Calibration updates, bug fixes, and new software features and algorithms available to archive users • OTFR pipeline uses the exact same code as current pre-archive science data processing • Reduces software development and maintenance costs • No science instrument specific code developed for OTFR beyond what is necessary for pre-archive data processing • Adds negligible time for retrievals

OTFR

Data Repair • Problems from bad telemetry must be repaired in order for the data to process automatically through the OTFR system. • There are two methods for handling data with bad telemetry values. • Use a binary editor to ‘fix’ the pod file. • OTFR has a built in mechanism to process from the EDT set, a somewhat processed version of the data that contains ASCII files that can be edited. • EDT set can be archived for problematic exposures

Data Distribution • Gzip compression will reduce outbound network load • Alternative media if Internet becomes the bottleneck • Tape (current) • CD (future) • DVD (future)

Code Reuse • Data Processing and Archive systems are designed for multi-mission/multi-instrument use. • OPUS has core system that consists of blackboard and pipeline control. • Instruments specific applications plug-in to core system. • DADS being redesigned to be less HST specific. • Archive catalog contains both general and instrument specific tables.

OPUS Code Reuse • Core OPUS system (OPUS 12.1) • ~236,000 lines of code • 100% reuse • WFC3 specific processes • Based on FUSE study (Rose et al. 1998, “OPUS: The FUSE Data Pipeline”, www.stsci.edu/software/OPUS/kona2.html) • 5076 lines of code • 71% reuse of existing OPUS modules • Expect > 99% reuse of existing data processing software for WFC3, based on lines of code. • All SI complexity contained in relatively few lines of code. • Efficient use of existing system!

Archive Systems Code Reuse • Archive changes: • Add WFC3 specific tables to archive catalog • Add WFC3 data to PI paper products • Add WFC3 to HST remote archive site distribution • Define default file types for CAL, UNCAL, and DQ flags on StarView retrieves • Add WFC3 specific screens to StarView • Estimate ~98% reuse

Operational Considerations • Data processing and archive operational system sizing based on SSR and TDRSS capacity • Current downlink limit is ~16 Gbits/day (20 minutes TDRSS contact per orbit)* • January 2001 downlink average was ~4.2 Gbits/day • Post SM3b downlink limit expected to be 29 Gbits/day (18 – 35 minute TDRSS contacts per day)* • Average daily science data downlink expected to be ~16 Gbits/day** • Under consideration: post SM4 downlink limit of 48 Gbits/day (2 – 35 minute TDRSS contacts per orbit)* * WFC3 ISR 2001-xxx “Data Volume Estimates for WFC3 Spacecraft and Ground System Operations,” C.M. Lisse and R. Henry, 18-Jan-2001 version ** ACS ISR-97-01 “HST Reference Mission for Cycle 9 and Ground System Requirements,” M. Stiavelli, R. Kutina, and M. Clampin, July 1997.

Operational Considerations (cont.) • Processing power – no problem • ACS IPT results show processing of 33 Gbits downlink data in 150 exposures in under 2 hours on ODOcluster1 (no dither) • Processor memory • ODOcluster1 memory of 2 GB per ES-40 is sufficient for ACS WFC calibration, which requires about 200 MB per dataset at any one time • Disk space • Needs to be re-evaluated for WFC3 CDR • Consider pre-archive and OTFR pipelines processing work space as well as calibration reference file space requirements

Major Science Data Processing Requirements Summary • Internal header specification (from Ball) • DM-06 to document content and format of science internal header • PDB (EUDL.DAT, TDFD.DAT) defined and fully populated • Keyword definitions (from STScI Science Instrument team) • Flags and indicators for Data Validation (from STScI Science Instrument team) • Aperture definitions (from STScI Science Instrument team)

Test Data Requirements • Test data from detectors on optical bench expected on March 4, 2002 and from integrated instrument on August 5, 2002 • Test data to be provided by IPT/Instrument Scientists and Engineers should include all science modes • Test data must include • PMDB population and PDB definition • list of possible error conditions to simulate • data that simulate error conditions • enough data for throughput test • engineering data to test jitter file production

Data Processing and Archive