1 / 17

Contents

Distributed Processing of Future Radio Astronomical Observations Ger van Diepen ASTRON, Dwingeloo ATNF, Sydney. Contents. Introduction Data Distribution Architecture Performance issues Current status and future work. Data Volume in future telescopes. LOFAR

veata
Download Presentation

Contents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Processingof Future Radio Astronomical ObservationsGer van Diepen ASTRON, Dwingeloo ATNF, Sydney

  2. Contents • Introduction • Data Distribution • Architecture • Performance issues • Current status and future work

  3. Data Volume in future telescopes • LOFAR • 37 stations (666 baselines) grows to 63 stations (1953 baselines) • 128 subbands of 256 channels (32768 channels) • 666*32768*4*8 bytes/sec = 700 MByte/sec • 5 hour observation gives 12 TBytes • ASKAP (spectral line observation) • 45 stations (990 baselines) • 32 beams, 16384 channels each • 990*32*16384*4*8 bytes/10 sec = 1.6 GByte/sec • 12 hour observation gives 72 Tbytes • One day observing > entire world radio archive • ASKAP continuum: 280 GBytes (64 channels) • MeerKAT similar to ASKAP

  4. Key Issues

  5. Data Distribution • Visibility data need to be stored in a distributed way • Limited use for parallel IO Too many data to share across network Bring processes to the data NOT bring data to processes

  6. Data Distribution • Distribution must be efficient for all purposes (flagging, calibration, imaging, deconvolution) • Process locally where possible and exchange as few data as possible • Loss of a data partition should not be too painful • Spectral partitioning seems best candidate

  7. Architecture Connection types: • Socket • MPI • Memory • DB

  8. Data Processing • A series of steps have to be performed on the data (solve, subtract, correct, image, ...) • Master get steps from control process (e.g. Python) • If possible, step is directly sent to appropriate workers • Some steps (e.g. solve) need iteration • Substeps are sent to workers • Replies are received and forwarded to other workers

  9. Calibration Processing Solving non-linearly do { 1: get normal equations 2: send eq to solver 3: get solution 4: send solution } while (!converged)

  10. Performance: IO • Distributed IO, yet 24 minutes to read 72 TByte once IO should be asynchronous to avoid idle CPU • Deployment decision what storage to use • Local disks (RAID) • SAN or NAS Sufficient IO-bandwidth to all machines is needed • Calibration and imaging are used repeatedly, so the data will be accessed multiple times • BUT operate on chunks of data (work domain) to keep data in memory while performing many steps on them • Possibly store in multiple resolutions • Tiling for efficient IO if different access patterns

  11. Performance: Network • Process locally where possible • Send as few data as possible (normal equations are small matrices) • Overlay operations e.g. Form normal equations for next work domain while Solver solves current work domain

  12. Performance: CPU • Parallelisation (OpenMP, ...) • Vectorisation (SSE instructions) • Keep data in CPU cache as much as possible, so smallish data arrays • Optimal layout of data structures • Keep intermediate results if not changing • Reduce number of operations by reducing the resolution

  13. Current status • Basic framework has been implemented and is used in LOFAR and CONRAD calibration and imaging • Can be deployed on cluster or super (or desktop) • Tested on SUN cluster, Cray XT3, IBM PC cluster, MacBook • Resource DB describes cluster layout and data partitioning. Hence the master can derive which processor should process with part of the data.

  14. Parallel processed image (Tim Cornwell) • Runs on ATNF’s Sun cluster “minicp” 8 nodes • Each node = 2 * dual core Opterons, 1TB, 12GB • Also on CRAY XT3 at WASP (Perth, WA) • Data simulated using AIPS++ • Imaged using CONRAD synthesis software • New software using casacore • Running under OpenMPI • Long integration continuum image • 8 hours integration • 128 channels over 300MHz • Single beam • Use 1, 2, 4, 8, 16 processing nodes for calculation of residual images • Scales well • Must scale up hundred fold • Or more….

  15. Future work • More work needed on robustness • Discard partition when processor or disk fails • Move to other processor if possible (e.g. replicated) • Store data in multiple resolutions? • Use master-worker in flagging, deconvolution • Worker can use accelerators like GPGPU, FPGA, Cell (maybe through RapidMind) • Worker can be a master itself to make use of BG/L in a PC cluster

  16. Future work • Extend to image processing (few TBytes) • Source finding • Analysis • Display • VO access?

  17. Thank you • Joint work with people at ASTRON, ATNF, and KAT • More detail in next talk about LOFAR calibration • See poster about CONRAD software Ger van Diepen diepen@astron.nl

More Related