1 / 32

Data Processing and Archive

Data Processing and Archive. Data Systems Team. www.stsci.edu/software/OPUS/ www.dpt.stsci.edu/ www.at.stsci.edu/. General Requirements for OPUS Science Data Processing. OPUS will develop the WFC3 pipeline based on the existing pipeline model for ACS and NICMOS including OTFR.

haracha
Download Presentation

Data Processing and Archive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Processing and Archive Data Systems Team www.stsci.edu/software/OPUS/ www.dpt.stsci.edu/ www.at.stsci.edu/

  2. General Requirements for OPUS Science Data Processing • OPUS will develop the WFC3 pipeline based on the existing pipeline model for ACS and NICMOS including OTFR. • Level 0 data is packetized and Reed-Solomon corrected by PACOR at GSFC • Receive science telemetry (level 1a data) at STScI as science instrument specific ‘pod files’ • Engineering snapshot included • No on-board compression for WFC3 data • STScI processing on Compaq ALPHA/Tru64 UNIX platform

  3. General OPUS Requirements for Science Data Processing (cont.) • OPUS must account for all scheduled exposures. • Convert telemetry to FITS format • Structure tables or data array • Populate header keywords • Keywords to provide metadata for archive catalog • Associate groups of exposures that must process further as a single unit • Execute calibration tasks in pipeline mode • Pass level 1b science data (pod files and uncalibrated science datasets) and jitter files to Hubble Data Archive

  4. OPUS Requirements for Thermal Vac • OPUS will develop a WFC3 science data processing pipeline capable of supporting Thermal Vac testing • No support schedule available in PMDB format • Necessary support schedule information to be read in from ASCII file • No associations • No calibration • Data will be archived to HDA • Processing on Sun / Solaris UNIX platform

  5. OPUS Requirements for Thermal Vac (cont.) Current Thermal Vac OPUS pipeline delivery schedule • Earliest possible WFC3 Thermal Vac currently scheduled for October 7, 2002. • OPUS Thermal Vac pipeline due about two months prior to Thermal Vac, August 5, 2002. • Beta version of OPUS pipeline due about five months prior to Thermal Vac, May 6, 2002.

  6. OPUS science data processing pipeline for WFC3

  7. OPUS Processes • Data Partitioning • segments the telemetry stream into standard EDT dataset • fill data inserted if telemetry drop-outs exist • constructs a data quality image to ensure the subsequent science processing does not interpret fill data as valid science data • Support Schedule • gathers proposal information from PMDB • test proposals required for development • test version of PMDB must be populated by TRANS • Thermal Vac support schedule to be input from ASCII file

  8. OPUS Processes (cont.) • Data Validation • decodes the exposure and engineering parameters in the telemetry and compares them to the planned values • internal header specification (from Ball) • PDB (EUDL.DAT, TDFD.DAT) must be fully populated and defined in DM-06 • flags and indicators need to be verified by the Instrument Scientists, but will likely be the same as ACS for UVIS channel and NICMOS for IR channel • World Coordinate System • implements a translation from telescope coordinates through the instrument light-path to an astronomically valid pointing • aperture positions must be defined

  9. OPUS Processes (cont.) • Generic Conversion • Generic Conversion outputs uncalibrated data • data will be output in standard FITS format with image or table extensions • primary header will contain keywords inherited by all extensions and a null data array • Separate headers and data formats for UVIS and IR channel data • UVIS channel keywords based on ACS • IR channel keywords based on NICMOS

  10. OPUS Processes (cont.) • Generic Conversion (cont.) • Data formats • UVIS channel follows ACS data format • each file contains two imsets, one for each chip • imset contains science, error, and data quality array • IR channel follows NICMOS data format • each file contains an imset for each readout • imset contains science, error, data quality, data samples, and effective integration time array • data quality array will be null if no telemetry dropouts • calibration generates full data quality array with all other DQ flags

  11. OPUS Processes (cont.) • Generic Conversion (cont.) • Required for development • DM-06 to develop algorithms for data formatting • keyword definitions (ICD-19) must be provided by the Instrument Scientists • world coordinate definitions • exposure time calculations • calibration switches and selection criteria • calibration file name keywords

  12. keyword name default value possible values units datatype short comment for header long description header position DADS table keyword source Keyword specification • The following information must be provided by STScI Science Instrument team for all WFC3 specific keywords using a standard form for keyword database input.

  13. OPUS Processes (cont.) • Data Collector • OPUS will ensure all necessary component exposures are present before processing further • association table contains all information about product dataset • dataset self-documenting • only associations required for data processing with be constructed in the OPUS pipeline • association created from a single proposal logsheet line • Error condition actions to be defined by Instrument Scientists • rules for processing incomplete associations • association time-out rules

  14. Calibration • OPUS will use STSDAS calibration software • run on ALPHA / Tru64 UNIX platform in operations • expands size of dataset • converts integer raw data to real • population of error and data quality array • Need calibration reference files for testing (at least dummies)

  15. Other Science Data Modes • requirements for data content of each of these other science data modes must be defined by Instrument Scientists • microprocessor memory dump • engineering diagnostic data • No target acquisition observations for WFC3

  16. Engineering data processing • Receive engineering telemetry data from CCS at GSFC • Process Engineering data through FGS Data Pipeline • Generate data products to characterize jitter and pointing control information in support of science observations • WFC3 jitter file association packaging will mimic science data associations • No other WFC3 specific requirements

  17. OPUS / Archive Interface • OPUS will present to the archive: • Original data received from PACOR (binary pod files) • Pod file data packaged by observation in FITS format • Output of Generic Conversion (uncalibrated science dataset) in FITS format • Output of STSDAS calibration (calibrated science dataset) in FITS format • Jitter files from the engineering telemetry in FITS format • Data from other science modes (target acquisition, memory dump, engineering diagnostic data) in FITS format

  18. Archive Ingest – Catalog Population • Archive catalog populated from FITS calibrated and uncalibrated science datasets • Header keyword values used to populate archive catalog database fields • Since keywords based on ACS and NICMOS design, archive catalog tables will also correspond to ACS and NICMOS design • engineering snapshot keywords from spt file used to populate instrument tables for trending analyses • Associations will also follow ACS/NICMOS design • Current database schema will work for WFC3 associations

  19. Archive Ingest – Data Store • Both binary and FITS versions of pod files written to archive media • FITS pod files planned to be input for OTFR • Currently Generic Conversion output (uncalibrated FITS dataset) written to archive media • May cease to write this dataset to archive media if FITS pod files and OTFR prove to be sufficient for archive • gzip file compression performed on all files prior to writing to archive media

  20. StarView 6

  21. OTFR • In OTFR, data retrieved from the archive are reprocessed from the pod file • Provides HST data user with optimal product at time of retrieval • Calibration updates, bug fixes, and new software features and algorithms available to archive users • OTFR pipeline uses the exact same code as current pre-archive science data processing • Reduces software development and maintenance costs • No science instrument specific code developed for OTFR beyond what is necessary for pre-archive data processing • Adds negligible time for retrievals

  22. OTFR

  23. Data Repair • Problems from bad telemetry must be repaired in order for the data to process automatically through the OTFR system. • There are two methods for handling data with bad telemetry values. • Use a binary editor to ‘fix’ the pod file. • OTFR has a built in mechanism to process from the EDT set, a somewhat processed version of the data that contains ASCII files that can be edited. • EDT set can be archived for problematic exposures

  24. Data Distribution • Gzip compression will reduce outbound network load • Alternative media if Internet becomes the bottleneck • Tape (current) • CD (future) • DVD (future)

  25. Code Reuse • Data Processing and Archive systems are designed for multi-mission/multi-instrument use. • OPUS has core system that consists of blackboard and pipeline control. • Instruments specific applications plug-in to core system. • DADS being redesigned to be less HST specific. • Archive catalog contains both general and instrument specific tables.

  26. OPUS Code Reuse • Core OPUS system (OPUS 12.1) • ~236,000 lines of code • 100% reuse • WFC3 specific processes • Based on FUSE study (Rose et al. 1998, “OPUS: The FUSE Data Pipeline”, www.stsci.edu/software/OPUS/kona2.html) • 5076 lines of code • 71% reuse of existing OPUS modules • Expect > 99% reuse of existing data processing software for WFC3, based on lines of code. • All SI complexity contained in relatively few lines of code. • Efficient use of existing system!

  27. Archive Systems Code Reuse • Archive changes: • Add WFC3 specific tables to archive catalog • Add WFC3 data to PI paper products • Add WFC3 to HST remote archive site distribution • Define default file types for CAL, UNCAL, and DQ flags on StarView retrieves • Add WFC3 specific screens to StarView • Estimate ~98% reuse

  28. Operational Considerations • Data processing and archive operational system sizing based on SSR and TDRSS capacity • Current downlink limit is ~16 Gbits/day (20 minutes TDRSS contact per orbit)* • January 2001 downlink average was ~4.2 Gbits/day • Post SM3b downlink limit expected to be 29 Gbits/day (18 – 35 minute TDRSS contacts per day)* • Average daily science data downlink expected to be ~16 Gbits/day** • Under consideration: post SM4 downlink limit of 48 Gbits/day (2 – 35 minute TDRSS contacts per orbit)* * WFC3 ISR 2001-xxx “Data Volume Estimates for WFC3 Spacecraft and Ground System Operations,” C.M. Lisse and R. Henry, 18-Jan-2001 version ** ACS ISR-97-01 “HST Reference Mission for Cycle 9 and Ground System Requirements,” M. Stiavelli, R. Kutina, and M. Clampin, July 1997.

  29. Operational Considerations (cont.) • Processing power – no problem • ACS IPT results show processing of 33 Gbits downlink data in 150 exposures in under 2 hours on ODOcluster1 (no dither) • Processor memory • ODOcluster1 memory of 2 GB per ES-40 is sufficient for ACS WFC calibration, which requires about 200 MB per dataset at any one time • Disk space • Needs to be re-evaluated for WFC3 CDR • Consider pre-archive and OTFR pipelines processing work space as well as calibration reference file space requirements

  30. Major Science Data Processing Requirements Summary • Internal header specification (from Ball) • DM-06 to document content and format of science internal header • PDB (EUDL.DAT, TDFD.DAT) defined and fully populated • Keyword definitions (from STScI Science Instrument team) • Flags and indicators for Data Validation (from STScI Science Instrument team) • Aperture definitions (from STScI Science Instrument team)

  31. Test Data Requirements • Test data from detectors on optical bench expected on March 4, 2002 and from integrated instrument on August 5, 2002 • Test data to be provided by IPT/Instrument Scientists and Engineers should include all science modes • Test data must include • PMDB population and PDB definition • list of possible error conditions to simulate • data that simulate error conditions • enough data for throughput test • engineering data to test jitter file production

More Related