2.37k likes | 2.53k Views
Data Management Center. Critical Design Review. February 16, 2006. Agenda. Overview and Context (30 min, 8:00 – 8:30) – Swade Element Architecture (60 min, 8:30 – 9:30) – Swade Element Design External Interfaces (30 min, 9:00 – 9:30) – Swade
E N D
Data Management Center Critical Design Review February 16, 2006
Agenda • Overview and Context (30 min, 8:00 – 8:30) – Swade • Element Architecture (60 min, 8:30 – 9:30) – Swade • Element Design • External Interfaces (30 min, 9:00 – 9:30) – Swade • Data Product Design (30 min, 10:00 – 10:30) – Swade • Data Processing Software Applications (30 min, 10:30 – 11:00) – Swam and Sontag • Calibration Software (30 min, 11:00 – 11:30) – Grumm and Hodge • Archive System Software (30 min, 11:30 – 12:00) – Miller • Lunch (60 min, 12:00 – 1:00) • Hardware (60 min, 1:00 – 2:00) – Singer • Prototype Results (30 min, 2:00 – 2:30) – Sontag and Swam • Functional and Performance Requirements Compliance (30 min, 2:30 – 3:00) – Swade • Development Plans (30 min, 3:00 – 3:30) – Swade • Integration, Test, and Verification Plans (30 min, 3:30 – 4:00) – Goldstein • Operations Support Plans (30 min, 4:00 – 4:30) – Hamilton • Programmatics (30 min, 4:30 – 5:00) – Taylor
Overview and Context Daryl Swade
DMC Context in Ground Segment YOU ARE HERE
DMC Background • Located at Space Telescope Science Institute in Baltimore, MD. • DMC will leverage Kepler software systems development off existing systems for HST, FUSE, GALEX, and other STScI supported missions. • Kepler operations at DMC will be integrated with HST operations at STScI. • Kepler data will become part of the Multi-Mission Archive at Space Telescope (MAST).
DMC High-Level Functions • Science data processing • Unpack telemetry • Uncompression • Format in FITS • Incorporate ancillary engineering • Incorporate s/c ephemeris • Science data archive • Data ingest • Archive catalog • Data Distribution • Archive user interface • Calibration • Remove cosmic rays from black level • Subtract black level from target and collateral pixels • Calculate dark current • Remove smear
Results of Prior Reviews Daryl Swade
DMC Peer Review I Pipeline and Calibration • Topics discussed • Pipeline infrastructure • MOC-DMC interface • Science data processing applications • Ancillary Engineering Data Processing • Calibration • Data formats • Data processing hardware • Data processing verification and test plans • Review team • Rick Thompson (NASA/Ames) • David Mayer (NASA/Ames) • Tim Conrow (JPL) • Peg Stanley (STScI) • Melissa Russ (STScI)
DMC Peer Review II Archive • Topics discussed • SOC – DMC data products • Archive hardware • Ingest • Archive catalog • DADS/NSA • Data distribution • Archive user interface • Demonstration queries • Archive verification an test plans • Review team • Rick Thompson (NASA/Ames) • David Mayer (NASA/Ames) • Pam Marcum (NASA/HQ) • Jim Etchison (STScI) • Melissa Russ (STScI)
Driving Requirements Daryl Swade Reference: DMC Requirements Document, KDMC-10001-001A, February 13, 2006
Requirement Management and Flowdown • DMC uses the DOORS database at Ball for requirement management. • Allows links with Ground Segment requirements • All Ground Segment requirements allocated to DMC have been traced to DMC requirements within DOORS. • See GSRD to DMC Requirements Traceability Matrix • DMC Requirement Documents generated with DOORS. • DMC Requirements Document, KDMC-10001 • DMC Verification and Test Matrix, KDMC-10020 • DMC Traceability Matrix, KDMC-10021 • Links requirements to high level design in DMC Architecture Document, KDMC-10002
DMC Architecture Daryl Swade Reference: DMC Architecture Document, KDMC-10002-002a, February 13, 2006
DMC Implementation Strategy • Kepler DMC will be implemented adopting existing science software systems at STScI. • Systems will be tailored for Kepler. • Software has been designed, as much as possible, to isolate the instrument/mission specific code. • The OPUS platform will be used to construct pipelines for data processing and ingest systems. • Science telemetry will be converted into astronomically standard FITS format files. • DADS will be used for the Data Archive and Distribution System. • MAST web archive interface will be used for data retrieval.
Science Data Processing (1 of 3) • Science Data Receipt • Receive and store science telemetry • Identity data type as long cadence, short cadence, utility target, or FFI • Unpack telemetry • Requires target and aperture definition • Verify data completeness at the pixel level for photometer data • Uncompress • Level of compression determined from telemetry packet header • Partition Data • Data sorted by pixel type: target, collateral, and background • S/c clock to UTC conversion • Based on s/c clock time coefficients supplied by the MOC
Science Data Processing (2 of 3) • Incorporate ancillary engineering • Ancillary data extracted from engineering telemetry will be incorporated into the science data set • Incorporate s/c ephemeris information • Update header keywords • Identity target as PI or GO • Populate photometer operating parameters for calibration • Determine Barycentric time correction • Determined for each CCD channel and stored in the FITS science table extensions • Uses S/c ephemeris data • Determine World Coordinate System parameters • Convert pixel coordinates to RA and Dec for center pixel of each channel • Determined for each CCD channel and stored in the FITS science table extensions
Science Data Processing (3 of 3) • Calibrate • Remove detector specific signatures • Transfer data to the SOC • Make original and calibrated data available to SOC within 12 hours after receipt from the MOC
Calibration • Calibrate data to remove pixel level systematic errors • Remove cosmic rays from black level • Subtract black level from target and collateral pixels • Calculate dark current • Remove smear • Collateral data associated with each observational cadence will be used to remove black level (bias), smear, and dark-current • Preserve temporal and spatial resolution • Comply with FITS format
Archive (1 of 2) • Data ingest • Archive and catalog all original data • Archive and catalog all calibrated data • Receive and archive relative light curves from the SOC • Preserve Kepler data archive for at least 10 years past EOM • Archive Catalog • Provide on-line science catalog of Kepler metadata for searches, data mining, and data retrieval • Metadata from cadence processing incorporated into archive database • Stage Kepler Input Catalog, Characteristics Table, Kepler Target Catalog, and Results Catalog from SOC for archival use
Archive (2 of 2) • Data Distribution • Make Kepler data available to the astronomical community though the MAST interface at STScI • Ensure proper proprietary access to Kepler data in the archive based on the Science Office defined Kepler Data Release Policy • Make relevant cadence data available to Guest Observer Program • Archive Interface • Provide a Kepler science data archive that is accessible at three access levels: • PI, Co-Is, SOC, and PSP participants • GOs and DAP • general users • Create and maintain software tools required to search the Kepler archive catalog • This includes user access to the target list through the Kepler data archive • Create and maintain software tools required to access original data, calibrated data, and light curves
DMC Pipeline Data Volume Estimates • Assumes 170,000 targets • FFI: 389 MBytes each • Other data types volume very small in comparison: utility targets, ancillary engineering, s/c ephemeris, pixel mapping reference files, gap reports, … • See DMC Architecture Document, KDMC-10002, for details
Archive Data Volume • Assumes 170,000 targets for entire mission • FFI: 389 MB each, total archive size < 50 GB for ~100 FFIs over mission • Other data types archive volume very small in comparison: utility targets, ancillary engineering, s/c ephemeris, pixel mapping reference files, gap reports, … • Factor of 2 gzip compression anticipated when writing to storage media (not include in above table) • See DMC Architecture Document, KDMC-10002, for details.
System Performance Estimate • Assumption: Kepler data processing at the DMC will be similar to HST processing with regard to compute cycles and I/O • Actually, the estimates determined by comparisons to HST are probably worst case since HST processing times are dominated by calibration, and Kepler calibration is relatively less complex • An average throughput for original data processing on the system can be assumed to be 55 MBytes/minute • Estimate 5 GBytes per day of Kepler original and calibrated data • 93 minutes to process 1 day, or about 6 hours to process 4 days, of Kepler cadence data (16:1 processing ratio) • 15 minutes to process one FFI
Proprietary Rights • Data release timeline (from Kepler Data Release and Scientific Publications Policy, KKPO-16001-001): • First 3 months of data – end of year 1 • Second 3 months of data – end of year 2 • Data to year 1 – end of year 3 • Data to year 1.5 – end of year 4 • Data to year 2 – end of year 5 • Data to year 2.5 – end of year 6 • Etc. • Except that all data are released one year after the end of the Kepler operational lifetime • Stars dropped from the target list are released to the public in two months from the time of the drop decision
Cumulative Volume of Public Data – No Down Select *Time measured in months from end of commissioning / beginning of science data collection
Cumulative Volume of Public Data – with Down Select 70,000 target down select at end of year one.
Accounting and Information Management • Data Processing reports • MOC -> DMC Data Receipt History • Compression Performance Statistics • Target Missing & Unusable Pixels History • Configuration History • Calibration History • Data processing throughput statistics • DMC -> SOC Data Delivery History • Information posted to web site accessible by GS elements and mission management • Statistics generated and posted as data processed and archived • Driving requirements: • Archive usage statistics • Data Archiving Status • Archive ingest rates • Data completeness in the archive • GO Data Delivery Status • Data distribution rates
Significant Design Changes Daryl Swade
Changes from Concept Study Review • Photometric analysis of all targets using Difference Image Analysis will now be performed at the SOC • Independent photometric analysis on a subset of planetary target stars will be performed at STScI • Data analysis function tracked within Science Working Group, not DMC • P-mode analysis no longer supported under mission baseline
Design Changes Since Preliminary Design Review (PDR) • Pixel mapping reference files • Cadence-level target data • Manual reprocessing in lieu of On-the-Fly Reprocessing • Black level cosmic ray rejection at DMC
Pixel Mapping Reference Files (1 of 2) • For cadence data, each pixel must be tagged with its x and y position within the channel • Channel identified by FITS binary table extension number • Each pixel also needs to be tagged with a target id • Tagging each pixel with associated aperture id also helpful • Pixel location information adds 10 bytes to each pixel • Target id – 4 bytes • Aperture id – 2 bytes • X location – 2 bytes • Y location – 2 bytes • This information is the same for any given pixel for the duration of a target definition (typically 90 days for long cadence data) • Header keywords point to (reference) appropriate PMRF • Extracting pixel location information into a separate reference file saves about 7 TB of file space per year for cadence data
Cadence-level Target Data - Ingest • It is anticipated that archive users will retrieve Kepler cadence data on a target basis • However, original and calibrated pixel data are processed in files containing data for each long and short cadence • Plan to sort pixels by target during Ingest • The ingest pipeline will read the target pixels from the cadence files and append them to the appropriate target file • On average, 64 pixels will be appended to each target file, 32 original pixel values and 32 calibrated flux values • When a target/aperture definition change is implemented, the target files will be permanently closed for writing and archived • If a target/aperture definition covers a 90-day period, the typical target file size will be 2.14 MB • Each cadence-level target file will reference a single pixel mapping reference file • For each cadence, collateral and background pixels will be stored in separate files • Collateral and background pixels are non-proprietary • Archive users will have an option to retrieve the relevant collateral and background pixels on a channel basis for a given target
Cadence-level Target Data - Distribution • Data retrieval options with cadence-level target data include: • All non-proprietary files for that target over the time range specified in request • Default over all time that target observed • All associated pixel mapping reference files • Collateral pixels • All collateral pixels for the target’s channel • Collateral pixels from a projection of target aperture, completely analogous to short cadence collateral pixels • Background pixels for the target’s channel within a given requested radius or for the entire channel • Other non-proprietary targets within that channel within a given radius • Once all targets within a channel are non-proprietary, an option will be provided to distribute all data in the entire channel • Providing original and calibrated pixel data sorted by target in addition to sorted by cadence essentially triples the cadence-level data in the Kepler Data Archive from 1.8 TB/year to 5.4 TB/year (uncompressed, unmirrored)
Distribution of GO Cadence Data • Cadence-level target data files simplify requirement to distribute data to GOs on a target basis • GO targets must be extracted from the cadence data and distributed to a GO without including data from the primary mission or other GO targets • Once data for an individual target is released to the public, the cadence-level data for that target must be made available separate from the proprietary targets • General archive users can request data for just a few individual targets • GOs would most likely want pixel list data from other nearby targets and background pixels for calibration purposes • Such a request can now be satisfied by allowing a GO to select additional non-proprietary data based on the above distribution scheme