160 likes | 398 Views
MWA Data Capture and A rchiving. Dave Pallot MWA Conference Melbourne Australia 7 th D e cember 2011. Talking Points. Data Capture and Archive System Systems overview Correlator data capture RTS data capture On-site data operations The Next Generation Archiving System (NGAS).
E N D
MWA Data Capture and Archiving Dave Pallot MWA Conference Melbourne Australia 7th December 2011
Talking Points • Data Capture and Archive System • Systems overview • Correlator data capture • RTS data capture • On-site data operations • The Next Generation Archiving System (NGAS). • Archive Details
Data Capture and Archive System Capture the data products from the MWA (MRO) and transport it to the peta-byte storage facility at the Pawsey Center (Perth) for later retrieval, processing and analysis.
Data Capture and Archive System • Correlator: • 24x GPU-X • ~32 MB/s (0.5 sec, 40 kHz, 32-bit) • On-site Storage: • ~48 TB of transportable storage • Pawsey: • 15 PB reserved for MWA • 96 GPU nodes for data processing
System Flow • Monitor & Control tells correlator to capture data. • Correlator dumps visibility data to configurable storage location. • Monitor & Control tells correlator to stop data capture. • Visibility files are produced, collected and transported to Pawsey for archiving (NGAS). • Observations, with their visibilities, are accessed and images are produced.
Correlator Data Capture • Data capture modes: • Save all. • Save all on trigger. Save All Mode • Dump all visibility data to a single data file per machine for the fixed duration of a single observation. • Size of each visibility file is dependant on the output block size and the duration of the observation.
Correlator Data Capture cont. Save All on Trigger Mode • Stream data to a circular disk buffer and only produce a visibly data file (flush the buffers) when triggered i.e. something interesting happens. • Telescope continuously on. • Trigger is activated by an expert who is external to the capture process. • Architecture allows automatic detection and triggering via various pipelines. • If there is no trigger then no visibility data is flushed to file. • Once triggered, the observation has ended.
Correlator Data Capture cont. • Circular buffer size of possibly 100’s GB on disk. • Example: 100 GB / 32 MB/s≅ 52 mins • Circular buffer size can configured. • Must be defragmented into a contiguous block to get maximum I/O performance.
Correlator Data Capture cont. • In both modes: • One visibility file per machine per observation is produced. • Total of 24 files per observation • Same data format and filenames. • No special treatment of data files once they are produced. • Special treatment of data buffers but that is hidden. • Files will have unique identifiers in the file name to link them to the meta-data in our databases.
RTS Data Capture • Accumulate and generate images on the GPU? • Avoid accessing visibilities from disk storage • Performance reasons (Concurrent disk access) • Images dumped to separate location to visibilities. • Visibilities can be purged if the RTS images are bad. • Will not be transported as they can be reconstructed. • Required more discussion • Mitch (RTS Group), M&C group, Curtin.
On-site Data Operations • Facility to process images from archiving node on-site • Tools to access visibilities form local storage. • Images/processing will be done outside of the MWA data pipeline. • Ability to “flag” bad data • Can be purged before transportation. • Who makes that decision?
Data Transport • Data transport from MRO to Perth? • Transportable disk array • 48 TB of storage • Interim measure • 10 Gb NBN • Fiber link form MWA to Pawsey • Termination location and timeframe is uncertain • Transportation and archive coordination • NGAS
Next Generation Archiving System (NGAS) • Distributed storage software solution. • Operate transparently across physically and logically separated location • Reliable communications (HTTP interface) • Supports archive replication and mirroring. • Access to data on-site and through the archive. • Scalable as it can co-ordinate multi-peta bytes of storage. • Lots of tools. • Proven architecture for archiving large data sets. • National Radio Astronomy Observatory (NRMO) • Atacama Large Millimeter/submillimeter Array (ALMA)
Archive • Standard features you would expect from an archive. • Performance/usage trends, retrieval, store, etc • Specific features to MWA • Sky Maps, Temperature plots, etc • Will evolve over time • Comprehensive meta-data search tool • RA/DEC, Source, Gains, Freq, Date/Time, temperatures etc • Pawsey supercomputer node. • Generate images from a composite set of visibilities. • Fully configurable pipeline plug-in architecture to archive. • Reduce I/O, storage & processing constraints for single users.
Current state of play • Raised a PO for 48 TB transportable storage array and controllers. • Arrive in the new year. • Data capture modes ready for first “Quarter T” roll-out. • May-June 2012 • First cut of archive subsystems (NGAS) • Implementation, benchmarking, commissioning, interfaces. • April 2012