190 likes | 386 Views
A real-time transient detection pipeline Using the GSB. Jayanta Roy NCRA – TIFR, Pune, India. @ CASPER 2011 on 12 th October 2011. Collaborators. Matthew Bailes (Swinburne Univ. of Technology) Ramesh Bhat (Swinburne Univ. of Technology) Sarah Burke-Spolaor (ATNF)
E N D
A real-time transient detection pipeline Using the GSB Jayanta Roy NCRA – TIFR, Pune, India @ CASPER 2011 on 12th October 2011
Collaborators Matthew Bailes (Swinburne Univ. of Technology) Ramesh Bhat (Swinburne Univ. of Technology) Sarah Burke-Spolaor (ATNF) Jayaram N. Chengalur (NCRA-TIFR) Peter Cox (Swinburne Univ. of Technology) Yashwant Gupta (NCRA-TIFR) Jayanti Prasad (IUCAA) Willem van Straten (Swinburne Univ. of Technology)
Transient Radio Universe A major discovery frontier of modern radio astronomy Astrophysical phenomena on a wide range of timescales Potential for uncovering new physics and astrophysics Front runner X-ray and γ-ray due to multiple wide field-of-view instrument great success in finding gamma-ray bursts, accreting sources, bursting pulsars Technological requirements A = (Sensitivity x Field of View ) needs to be very high upcoming instruments, e.g. ASKAP, LOFAR, MWA, MeerKAT etc. Computational requirements are severe, especially for fast (T << seconds) transients data need to be sampled at rates ~ 10s s with high frequency resolution Time scale too short for follow-ups for non-repeating events e.g. millisecond burst reported near SMC (Lorimer et al 2007, Burke-Spolaor et al 2010)
Slow vs Fast Radio Transients SN1987A light curve (radio) A Giant pulse from the Crab pulsar Cordes et al. (2004) Turtle et al. (1987) A “fast” transient A “slow” transient • Slow transients: • imaging on wide time integrations : snapshot to daily • Fast transients: • Timescales ~ seconds or shorter ~ from micro seconds to seconds • Time-domain processing at high time and frequency resolution • Affected by plasma propagation effects – dispersion, multi-path scattering and scintillation
Transient Exploration with GMRT A multi-element radio interferometer (30 x 45m dishes) with effective collecting area ~ 3% SKA Long baselines and sub-array capability provide spatial filtering against impulsive RFIs Event localization using imaging at a resolution of 5” Availability of new HPC backend (GSB) with enormous scientific potentials ability to run multi-subarray beamformer at high time and frequency resolution ability to capture raw voltage samples from individual antenna at Nyquist resolution multibeaming across FoV Interfaced with GPU resources to serve real-time compute requirements GMRT makes an excellent test-bed for developing the techniques and strategies applicable for next-generation (array type) instruments
GSB Schematic • Input data rate : • 32 antennae x 2-pols base-band analog inputs @ 32 MHz of bandwidth • ➤ 2 GSamples/sec (using 16 ADC cards with 4 analog inputs in each card) • 232 cores Intel Xeon CPUs • Each node with 2 GB RAM • Dual GbE network interface • for input data streaming • Dual add-on GbE network interface for high time resolution output data streaming • 3.9 Tflops @ 15kwatt • Max output streaming ~ • 3.5 TB/hour • Storage : 128 TB Roy, J. et al., Exp Astron, 2010
GSB Specs • Simultaneous operation as • FX correlatoras an Imaging instrument • Beamformeras a Pulsar receiver • Input data rate : 2 Gsamples/s • Required Compute load : 487 Gflops • FFT (181 Gflops ) • + • Fringe rotator ( 8.25 Gflops ) • MAC (280 Gflops) Beamformer (17 Gflops) • Output data rate : • MAC output : 4 MB/s • Beamformer output : 128 MB/s
24 X 7 compute status Vectorized code (using 128-bit SSE) :with cache optimization and multi-threaded load balancing ensure operation on multiple data elements in parallel on a given physical/logical core Optimization status : Required compute load : 490 Gflops Achieved compute power : 1.5 Tflops Compute to power ratio : 260 Mflops/watt Compute to cost ratio : 45 Mflops/USD Real-time GSB is a highly optimized multi-threaded vectorized parallel pipeline, working ~ 90% of the theoretical peak Fflops
Detecting transients with the GMRT • Multi sub-array : trade-off between detection sensitivity and efficiency to reduce • false trigger due to RFIs and noise statistics • Searching for transients : • Dedispersion • Event detection by matched filtering followed by thresholding • Event identification/association in time and DM • Coincidence filtering and candidate selection • Event indexer • Sending trigger to GSB raw data capture system • Event localization using snapshot imaging • Detailed time-domain study using multi-pixel phased array beams Strategy : commensual, 24/7 observing mode Full raw recording results in 172 TB per day !!
Detection sensitivity vs Efficiency of coincidence filtering A sample data with all spurious event logged • For 4 sub-arrays with 5 sigma threshold • < 1 events per 20 secs • ~ 2000 events per day -> 3 TB of raw • data • Quasi-simultaneous offline processing + • snapshot imaging @ 4x of real-time • 100 mJy as detectable flux for 100 ms • effective time resolution • 10% increase in sensitivity with 2 • times increase in event rate
Discrimination of RFI using coincidence filtering using 4 incoherent (sub-array) beams • Discriminate out fast radio transient from RFI using DM (= 0) filter. • Multiple beam coincidence filter reduces the false triggers due to direction dependent RFI
Real-time processing Proposed scheme: GSB cluster resources for raw data capture + beam generation; additional computing resources for transient processing GMRT array GSB cluster Transient Detector Trigger Generator Real-time compute and I/O requirements Beamformer output @ 512 MB/s for 4 beams with 60 us 512 channel filterbank Dedispersion and transient search on single 8 core CPU is 15 x of real-time (300 Gflops per beam) !
Real-time implementation • Simultaneous 4 incoherent beamformer • 10% increase in main GSB compute load • 1.5 times increase in output network I/O -> 650 MB/s total output I/O, newly • added 3 GB/s separate network paths handle all output I/O • Dedispersion • Processing involves searching over a large range of dispersion measure (DM) • need around 1000 trial DMs • GPU based dedispersion (for 1000 DMs @ real-time) running on tesla • (Ack : Ben Barsdell @ Swinburne) • 1 tesla per beam -> 4 CPU host machines work as transient cluster • Transient detector • Event detection from each of the dedispersed time-series is 20% of the • processing load • Each beam is processed on 150 Gflops (vector power) i7 CPU host, each • equipped with 12 GB memory to hold multiple intermediate data blocks • Raw data capture • 128 TB storage attached with the compute cluster , capable to flush data @ 5 GB/s • 2secs of data buffering in order to accommodate scattered, dispersed pulse at • GMRT frequency with moderate DMs
Event scrutiny (off-line) Detailed off-line analysis can provide event localization within 5” using snapshot Imaging and 4x boost in sensitivity for time domain signal characterization
Case study for PSR B1748-28 • In beam source PSR B1748-28 is • detected in the image plane at an • large offset from the phase centre • SNR improvement of 2.5x by forming • phased array beam towards the pulsar
Time-domain event scrutiny Crab Giant pulse at 4 different frequencies Coherent dedispersed giant pulse intensity @ 8 us time resolution • Observed maximum dispersion delay of 9.038 sec • across the full range 610 MHz to 156 MHz • The flux calibration of giant pulses • Study of the giant pulse intensity, energies and • scattering time distribution
Multi-pixelization of the FoV for search of pulsed source • Simultaneous search over 400 beams using the GSB • 4 times improvement in sensitivity • 400 times wider search • Compute requirements ~ • Pixelization ~ 5 Tflops • FFT cost ~ 170 Gflops • Phase centre correction cost ~ 3 Tflops • Beam-forming cost ~ 1.75 Tflops
Multi-pixelization A science case : localization of a GMRT discovered Fermi MSP (PI : Bhaswati Bhattacharyya @ IUCAA) • Current implementation gives 5 phased array beams @ real-time • Total no. of beams 16 • In beam source is localized in the image plane at an offset of • 3.75’ X 4’ from the phase centre using imaging followed by multiple beamformers SNR improvement of 3 with beamwidth reduces from 80’ to 1.4’