950 likes | 1.15k Views
NOAO Mosaic Pipeline. Technical Presentation. Outline of Technical Presentation. Introduction Contexts Capabilities Architecture Implementation. Presentation Goals. Convince you that: we understand the problem requirements resources components and that the project is feasible
E N D
NOAO Mosaic Pipeline Technical Presentation NOAO Mosaic Pipeline CoDR
Outline of Technical Presentation • Introduction • Contexts • Capabilities • Architecture • Implementation NOAO Mosaic Pipeline CoDR
Presentation Goals Convince you that: • we understand the • problem • requirements • resources • components • and that the project • is feasible • has a solution for the primary application • has a flexible design for expansion and wider application NOAO Mosaic Pipeline CoDR
Guiding Principles • Modest project • Part of Data Products Program • (NOAO) Mosaic Imaging Data • Dedicated pipeline NOAO Mosaic Pipeline CoDR
Principles: Modest Project • Reuse as much software as possible • Keep it simple software NOAO Mosaic Pipeline CoDR
Principles: DPP • MDHS: Mosaic Data Handling System • IRAF: Image Reduction and Analysis Facility • NSA: NOAO Science Archive • DTS: Data Transport System • OPUS: AURA sister institution (STScI) • GONG: AURA sister institution (NSO) NOAO Mosaic Pipeline CoDR
Principles: (NOAO) Mosaic Data • Use experience of Mosaic Survey Teams • Need to deal with specific peculiarities • Crosstalk, pupil reflections • Allow for high performance per exposure (for real-time telescope context) by capitalizing on the inherent data parallel nature of mosaic imaging data NOAO Mosaic Pipeline CoDR
Principles: Dedicated Pipeline • Network of similar computers • No competition with general users NOAO Mosaic Pipeline CoDR
Pipeline infrastructure CCD mosaic data reduction Data quality assessment Image differencing Catalog production Database entry and querying Source merging/classification Archive ingest and retrieval Alerts Monitoring Data transport High performance computing Parallel computing More … What does this project encompass? Algorithms, interfaces, and software for: NOAO Mosaic Pipeline CoDR
Contexts In what contexts will the pipeline run? Can we design a pipeline to satisfy multiple contexts? NOAO Mosaic Pipeline CoDR
Contexts • NOAO • Telescope/operational context • Archive/NVO context • Community • NOAO Mosaic surveys and observers • Other mosaic instruments NOAO Mosaic Pipeline CoDR
Priorities • NOAO Archive • NOAO Mosaic observers • telescope • downtown • home institution • NOAO Mosaic observers at home • Community NOAO Mosaic Pipeline CoDR
NOAO Contexts • Downtown center fed from telescope • Mountain at telescope • Archive on-the-fly reprocessing NOAO Mosaic Pipeline CoDR
Pipeline Locations Pipeline Pipeline Tucson Archive La Serena Archive Kitt Peak Pipeline Cerro Tololo Pipeline NOAO Mosaic Pipeline CoDR
Context: Downtown Pipeline Observer DCA Data Spool and Transport Pipeline Archive User @ telescope, downtown, home DSC NOAO Mosaic Pipeline CoDR DTS
Context: Mountain Pipeline Data Spool and Transport DCA Pipeline Archive User @ telescope NOAO Mosaic Pipeline CoDR DTS
Context: Archive Pipeline Pipeline Archive User @ home NOAO Mosaic Pipeline CoDR DTS
Context: User Pipeline User @ home Pipeline @ home NOAO Mosaic Pipeline CoDR More
Proposed Context • Downtown pipeline for NOAO archive • Observer may subscribe to data products • At telescope, downtown, home • Images, catalogs, alerts, … • Observer may connect to DQ monitors • Pipeline software available at telescope with minimal support • DQ task/monitors may run at telescope NOAO Mosaic Pipeline CoDR
Data Requirements The pipeline design is dependent on the information available about the input data. • Basically we require data with the current NOAO Mosaic readout format that includes: • identification of exposure type (object, etc) • description of regions (data, overscan) • an approximate world coordinate system NOAO Mosaic Pipeline CoDR
Data Requirements There may be additional information that the pipeline will use if present. • Associations: type, ID, total and index SEQUENCE = ‘zero2002-12-16T043244.20.3’ SEQUENCE = ‘dither2002-12-16T043244.5.2’ If not present heuristics will be used based on a requirement that data enters in time order NOAO Mosaic Pipeline CoDR
Capabilities • Capabilities • Major Features and Goals • Data Products • Basic • Advanced • Data Quality Assessment • Instrumental Calibration NOAO Mosaic Pipeline CoDR
Capabilities • Calibrate mosaic exposures • Update instrumental calibrations • Identify potential bad data (data quality assessment) • Monitor trends and maintain database • Stack dither sets • Catalog and classify objects and artifacts • Get and subtract reference image and detect sources • Identify interesting sources • Automatically provide data products to subscribers • Keep up with observing given sufficient CPU resources NOAO Mosaic Pipeline CoDR
Major Features and Goals • Data products for NOAO archive and NVO node • Data products for observers (by subscription) • Pipeline for NOAO and mosaic community • Basic CCD mosaic calibrations • Advanced time-domain data products • Real-time data quality assessment and monitoring • High performance, data parallel system • LSST testbed • Fairly generic pipeline infrastructure (NEWFIRM, …) • Automated operation • Thorough processing history and data documentation NOAO Mosaic Pipeline CoDR
Data Products: Basic • Instrument calibrated mosaic exposures • Rough photometric zero point • Astrometric calibrations • Data quality evaluations • Updated calibrations • Bad pixel, saturated, bleed trail masks • Object catalogs • Object masks • Observing logs • Processing information • logs • graphs NOAO Mosaic Pipeline CoDR
Data Products: Advanced • Dither stacks • Exposure masks • Field Catalogs • Difference image detections • Relative to dither stack • Relative to archive or catalog reference • Light curves • Variable object detections • Unusual object alerts • Moving object trajectories NOAO Mosaic Pipeline CoDR
Instrument Telemetry Crosstalk Overscan Bias, flat Noise Focus / Distortions Sky Seeing (PSF) Sky brightness Approx. zero point Twilight Moon up / distance Data Quality Assessment Data quality measures are monitored against preset and user limits as well as adaptive time series limits. Some quantities include mean, sigma, and spatial variations. NOAO Mosaic Pipeline CoDR
Crosstalk [1] CCD defects [2,4,5] Saturated pixels [2,4,5] Bleed trails [2,4,5] Cosmic rays [2,4,5] WCS update [3] Overscan [2] Bias [2] Flat field [2] Pupil pattern [3] Fringing [3] Approx. zero point [3] Instrumental Calibrations • Requires image data from full mosaic (non-parallel) • Each image element independent of others (parallel) • Global calculation on measurements images (parallel and non-parallel) • Interpolate in data • Flag in mask NOAO Mosaic Pipeline CoDR
Instrumental Calibrations Two-pass calibration for telescope context: • Nighttime pass for immediate and nearly complete calibrated exposures • Daytime pass for calibration update from the full night’s data set NOAO Mosaic Pipeline CoDR
Nighttime Pass • Perform standard CCD calibrations: • Use afternoon master bias • Use most recent flat field • Apply pupil and fringe correction • Use most recent pupil and fringe templates • Apply global coordinate calibration NOAO Mosaic Pipeline CoDR
Daytime Pass • Determine if night’s data is suitable for deriving updates to library calibrations • Derive new pupil, fringe, and sky flat calibrations • Evaluate changes and significance of new calibrations • Update library calibrations for next night • Update night’s exposures with new calibrations • Combine afternoon biases into new master bias • Combine afternoon dome flats if no library flat NOAO Mosaic Pipeline CoDR
Other Contexts • For archive data will either already have best calibration from library or will be derived by requesting raw data for night • At home or in the community raw data will be queued as at telescope • Documentation and support (data ingest applications) will be provided NOAO Mosaic Pipeline CoDR
Data Products Subscription • Capability of the DPP system • Not necessarily specific to the pipeline but requires interfacing with DTS • Allows external software to request notification of new data products • Allows flexibility and broader access • Has implications for the pipeline context NOAO Mosaic Pipeline CoDR
Architecture • What is a pipeline? • Mosaic Pipeline Architecture Concept • Pipeline Components • Controls and Monitors • Modules • Calibrations and Database (Rafael Hiriart) • Archive (Robyn Allsman) NOAO Mosaic Pipeline CoDR
What is a Pipeline? System to transform input data to output data • Automated • Composed of processing steps (modules) • Steps connected by rules (triggers) • Provides monitoring and alerts • Error tolerant (continue with next input data) NOAO Mosaic Pipeline CoDR
Mosaic Pipeline Architecture Concept • Multiple CPUs but no dependency on N • Multiple types of sub-pipelines by function • One for operations over all mosaic elements • One for operations on individual elements • One for cataloging • One for image differencing • All types on all CPUs: no master! • Sub-pipelines triggered by files NOAO Mosaic Pipeline CoDR
Mosaic Pipeline Architecture Concept • All CPUs with identical pipeline software, possibly on common NFS disk • Assign work by minimum data backlog • Transfer data to local CPU disk: not NFS! • Optimize by modules writing to next trigger directory • Controls connected to operator console • Monitors viewed via network by multiple parties NOAO Mosaic Pipeline CoDR
Network of Sub-pipelines and CPUs Pipeline CPU CPU CPU MEF MEF MEF SIF SIF SIF CPU CPU CPU SIF SIF SIF MEF MEF MEF MEF: pipeline for operations over all mosaic extensions; eg crosstalk, global WCS correction SIF: pipeline for single CCD images; eg ccdproc, masking NOAO Mosaic Pipeline CoDR
Data Flow Concept Last module in one pipeline writes output directly to the data directories of the host for next pipeline, with the host selected by having the minimum number of waiting data files. NOAO Mosaic Pipeline CoDR
Data Flow Algorithm • Search list of potential hosts: • Check if host is up • Check number of trigger files • Assign output filename to data directory of host with least number of data files • Network filenames are used: (eg. host!directory/filename • Module runs and writes output files NOAO Mosaic Pipeline CoDR
Data Flow Networking • Use a daemon automatically spawned the first time data is transferred to a host • Daemon provides portability across platforms; eg. Unix and VMS NOAO Mosaic Pipeline CoDR
Data Flow Networking: Example • Crosstalk input is Obj123.fits with 2 extensions • Output names are generated from Host.dat: • Host1 has two waiting files, Host2 has one, Host3 is down, Host4 has none • Host2!Obj123.1, Host4!Obj123.2 • Crosstalk module runs and writes output files directly to the hosts • There are no extra network copy or splitting steps NOAO Mosaic Pipeline CoDR
Data Flow Networking: Example Host1: Obj456.1 Obj321.2 Host2: Obj567.2 Host2!Obj123.2 Obj123.2 Host0: Crosstalk Obj123 Host4: DOWN Host3: Host3!Obj123.1 Obj123.1 NOAO Mosaic Pipeline CoDR
Pipeline Components Controls & Monitors Pipeline raw data data products Data Source (DTS, user) Data Sink (DTS, user) Module Module Calibrations & Databases NOAO Mosaic Pipeline CoDR
Pipeline Modules Pipeline Module Module Module CLSH CSH API NOAO Mosaic Pipeline CoDR
Data Parallel Modules Some algorithms may need to be (re-)implemented specifically for a data parallel pipeline. One type is where measurements are made across the mosaic for a global calibration. Rather than requiring all pieces to be in one pipeline arrange for measurements made in parallel to be collected for the global calibration and then apply the global calibration to the pieces in parallel. NOAO Mosaic Pipeline CoDR
Data Parallel ModulesWCS Example • Catalog objects in each CCD in parallel • Bring catalogs (not images) together • Only need x/y coordinates of brighter stars • Match sources to ref. catalog (eg. USNO) • Compute global correction (shift, scale, etc.) • Return correction coefficients to parallel pipelines to be applied to each CCD • Cataloging and correction stages can be separated and run asynchronously with other stages NOAO Mosaic Pipeline CoDR
Data Parallel ModulesFringe/Pupil Example • Determine best global scaling of pupil and fringe templates to each exposure and then subtract scaled template • Compute statistics over each CCD in parallel • Combine statistics to get global scale factor • Subtract template with global scale from each CCD in parallel NOAO Mosaic Pipeline CoDR
Pipeline Triggers • Files: trigger on appearance of files • Flags: trigger on particular set of flags • Timers: trigger at times or intervals • File contents: trigger on keywords, etc • Messages: trigger on messages • Resources: trigger on resources May be more but one type can mimic others NOAO Mosaic Pipeline CoDR
Pipeline Triggers • File triggers useful for initiating a pipeline • Flag triggers useful within a pipeline to communicate success of previous steps • Flag triggers also useful for waiting for completion of parallel steps • Timer triggers useful in telescope pipeline for performing different daytime/nighttime steps NOAO Mosaic Pipeline CoDR