1 / 16

MWA Data Capture and A rchiving

MWA Data Capture and A rchiving. Dave Pallot MWA Conference Melbourne Australia 7 th D e cember 2011. Talking Points. Data Capture and Archive System Systems overview Correlator data capture RTS data capture On-site data operations The Next Generation Archiving System (NGAS).

polly
Download Presentation

MWA Data Capture and A rchiving

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MWA Data Capture and Archiving Dave Pallot MWA Conference Melbourne Australia 7th December 2011

  2. Talking Points • Data Capture and Archive System • Systems overview • Correlator data capture • RTS data capture • On-site data operations • The Next Generation Archiving System (NGAS). • Archive Details

  3. Data Capture and Archive System Capture the data products from the MWA (MRO) and transport it to the peta-byte storage facility at the Pawsey Center (Perth) for later retrieval, processing and analysis.

  4. Data Capture and Archive System • Correlator: • 24x GPU-X • ~32 MB/s (0.5 sec, 40 kHz, 32-bit) • On-site Storage: • ~48 TB of transportable storage • Pawsey: • 15 PB reserved for MWA • 96 GPU nodes for data processing

  5. System Flow • Monitor & Control tells correlator to capture data. • Correlator dumps visibility data to configurable storage location. • Monitor & Control tells correlator to stop data capture. • Visibility files are produced, collected and transported to Pawsey for archiving (NGAS). • Observations, with their visibilities, are accessed and images are produced.

  6. Correlator Data Capture • Data capture modes: • Save all. • Save all on trigger. Save All Mode • Dump all visibility data to a single data file per machine for the fixed duration of a single observation. • Size of each visibility file is dependant on the output block size and the duration of the observation.

  7. Correlator Data Capture cont. Save All on Trigger Mode • Stream data to a circular disk buffer and only produce a visibly data file (flush the buffers) when triggered i.e. something interesting happens. • Telescope continuously on. • Trigger is activated by an expert who is external to the capture process. • Architecture allows automatic detection and triggering via various pipelines. • If there is no trigger then no visibility data is flushed to file. • Once triggered, the observation has ended.

  8. Correlator Data Capture cont. • Circular buffer size of possibly 100’s GB on disk. • Example: 100 GB / 32 MB/s≅ 52 mins • Circular buffer size can configured. • Must be defragmented into a contiguous block to get maximum I/O performance.

  9. Correlator Data Capture cont. • In both modes: • One visibility file per machine per observation is produced. • Total of 24 files per observation • Same data format and filenames. • No special treatment of data files once they are produced. • Special treatment of data buffers but that is hidden. • Files will have unique identifiers in the file name to link them to the meta-data in our databases.

  10. RTS Data Capture • Accumulate and generate images on the GPU? • Avoid accessing visibilities from disk storage • Performance reasons (Concurrent disk access) • Images dumped to separate location to visibilities. • Visibilities can be purged if the RTS images are bad. • Will not be transported as they can be reconstructed. • Required more discussion • Mitch (RTS Group), M&C group, Curtin.

  11. On-site Data Operations • Facility to process images from archiving node on-site • Tools to access visibilities form local storage. • Images/processing will be done outside of the MWA data pipeline. • Ability to “flag” bad data • Can be purged before transportation. • Who makes that decision?

  12. Data Transport • Data transport from MRO to Perth? • Transportable disk array • 48 TB of storage • Interim measure • 10 Gb NBN • Fiber link form MWA to Pawsey • Termination location and timeframe is uncertain • Transportation and archive coordination • NGAS

  13. Next Generation Archiving System (NGAS) • Distributed storage software solution. • Operate transparently across physically and logically separated location • Reliable communications (HTTP interface) • Supports archive replication and mirroring. • Access to data on-site and through the archive. • Scalable as it can co-ordinate multi-peta bytes of storage. • Lots of tools. • Proven architecture for archiving large data sets. • National Radio Astronomy Observatory (NRMO) • Atacama Large Millimeter/submillimeter Array (ALMA)

  14. Archive • Standard features you would expect from an archive. • Performance/usage trends, retrieval, store, etc • Specific features to MWA • Sky Maps, Temperature plots, etc • Will evolve over time • Comprehensive meta-data search tool • RA/DEC, Source, Gains, Freq, Date/Time, temperatures etc • Pawsey supercomputer node. • Generate images from a composite set of visibilities. • Fully configurable pipeline plug-in architecture to archive. • Reduce I/O, storage & processing constraints for single users.

  15. Current state of play • Raised a PO for 48 TB transportable storage array and controllers. • Arrive in the new year. • Data capture modes ready for first “Quarter T” roll-out. • May-June 2012 • First cut of archive subsystems (NGAS) • Implementation, benchmarking, commissioning, interfaces. • April 2012

  16. Thank You

More Related