180 likes | 428 Views
by Arjun Radhakrishnan supervised by Prof. Michael Inggs. Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units. Outline. Graphics Processing Units (GPUs) Pulsars Pulsar De-dispersion Motivation Implementation Results Conclusion & Future Work.
E N D
by Arjun Radhakrishnan supervised by Prof. Michael Inggs Accelerating Coherent PulsarDe-dispersion onGraphics Processing Units
Outline • Graphics Processing Units (GPUs) • Pulsars • Pulsar De-dispersion • Motivation • Implementation • Results • Conclusion & Future Work
Graphics Processing Units • GPUs are massively parallel processors that are present on consumer graphics cards • Generally used to render 3D objects on screen and calculate the colour of pixel to display *Source: [7] • Are mass market products due to the video game industry • Performance tracks Moore's Law since the majority of on-chip space is devoted to compute units as opposed to cache on CPUs
Why Use GPUs? Figure 1: Peak floating point performance of NVIDIA GPUs vs Intel CPUs [2]
Pulsars • Highly magnetised, rapidly rotating neutron stars formed after a supernova • Pulsars emit beams of electromagnetic radiation from their magnetic poles • Beams sweep in a circular path called the “lighthouse effect” • Produce periodic pulses when the pulse sweeps Earth Figure 2: Pulsar Model [3]
Pulsar Dispersion • Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) • Lower frequency components of the pulse are delayed more than higher frequencies
Pulsar De-dispersion • Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) • Lower frequency components of the pulse are delayed more than higher frequencies • Correct for the dispersion by shifting the received signal a certain amount Figure 3: Pulsar De-dispersion [4]
Coherent De-dispersion • Coherent de-dispersion is the most accurate method of removing the dispersion effects of the Interstellar Matter • Preserves amplitude and phase information from the receiving signal • Convolve the voltage signal with the inverse transfer function of the ISM • This transfer function is a function of the Dispersion Measure (DM) of the signal got from models of the galactic electron density • In practice we use the Fast Fourier Transform (FFT) to make the convolution operation a multiplication in the frequency domain and then apply an inverse FFT
Motivation • Why study Pulsars? • A major SKA Science driver: Detection of gravitational waves and tests of strong field relativity; Analysing black holes • GPU acceleration for MeerKAT • Large frequency range (Low: 0.5 – 2.5 GHz, High: 8 – 14.5GHz) • High bandwidth per polarisation (4GHz final) • Large number of channels (16384) • >10GB of data per second • Even more important for SKA since precision will be a high priority and data storage is not feasible
Implementation Considerations • Both CPU and GPU were tested with single-precision floating point • A bottleneck for GPU computing is the time taken to send data to it from main memory – minimise as much as possible • Use asynchronous data transfers to hide the latency • Re-calculate rather than copy data across • Use shared memory on the GPU for calculations and store to global memory at the end • Source data file used is fake dual polarisation data generated with a DM of 50pc/cm3 and 100MHz bandwidth centred on 1450MHz
Basic Program Flow HOST DEVICE Read in Data Copy to GPU memory Allocate memory on GPU Initiate GPU Kernel Begin De-dispersion Parallel FFT Parallel FFT Parallel FFT ... V(f0) . H-1(f0) V(f1) . H-1(f1) V(fn) . H-1(fn) ... Inverse FFT Inverse FFT Inverse FFT ... + + Output Array Receive de-dispersed signal Send Data Back to Host Free Memory Figure 4: Program flow
Results Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)
Results • Was able to coherently de-disperse 50MHz on 1 GPU • Used 2 GPUs for the full 100MHz • Scaling across multiple GPUs was linear • Using larger transfer functions was found to increase performance since there was less of an overhead in memory access times
Conclusion • GPUs are significantly faster than CPUs for de-dispersion • Enabled real-time coherent de-dispersion for the dataset used • Coherent de-dispersion of a 100MHz bandwidth signal requires multiple GPUs at present • Faster memory access would greatly improve overall speedup • Currently testing with real undetected pulsar data
Questions? Thank You!
References • D. R. Lorimer and M. Kramer, Handbook of Pulsar Astronomy Cambridge University Press, 2005 • NVIDIA CUDA Programming Guide • D. Manchester, “CSIRO ATNF Pulsar Education Page” • Jim Cordes, “The SKA as a Radio Synoptic Survey Telescope: Widefield Surveys for Transients, Pulsars and ETI”, SKA Memo 97 • John Rowe Animation/Australia Telescope National Facility, CSIRO [Online]. http://www.atnf.csiro.au/research/pulsar/array/gallery.html • Cornell University Dept. of Astronomy, “Legacy Pulsars: Homepage” [Online]. http://arecibo.tc.cornell.edu/legacypulsardata/Default.aspx • VR-Zone, “The NVIDIA GeForce GTX 280 1GB bare,” [Online]. http://vr-zone.com/articles/nvidia-geforce-gtx-280-preview/5872.html?doc=5872