Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units

by Arjun Radhakrishnan supervised by Prof. Michael Inggs Accelerating Coherent PulsarDe-dispersion onGraphics Processing Units

Outline • Graphics Processing Units (GPUs) • Pulsars • Pulsar De-dispersion • Motivation • Implementation • Results • Conclusion & Future Work

Graphics Processing Units • GPUs are massively parallel processors that are present on consumer graphics cards • Generally used to render 3D objects on screen and calculate the colour of pixel to display *Source: [7] • Are mass market products due to the video game industry • Performance tracks Moore's Law since the majority of on-chip space is devoted to compute units as opposed to cache on CPUs

Why Use GPUs? Figure 1: Peak floating point performance of NVIDIA GPUs vs Intel CPUs [2]

Pulsars • Highly magnetised, rapidly rotating neutron stars formed after a supernova • Pulsars emit beams of electromagnetic radiation from their magnetic poles • Beams sweep in a circular path called the “lighthouse effect” • Produce periodic pulses when the pulse sweeps Earth Figure 2: Pulsar Model [3]

Pulsar Dispersion • Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) • Lower frequency components of the pulse are delayed more than higher frequencies

Pulsar De-dispersion • Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM) • Lower frequency components of the pulse are delayed more than higher frequencies • Correct for the dispersion by shifting the received signal a certain amount Figure 3: Pulsar De-dispersion [4]

Coherent De-dispersion • Coherent de-dispersion is the most accurate method of removing the dispersion effects of the Interstellar Matter • Preserves amplitude and phase information from the receiving signal • Convolve the voltage signal with the inverse transfer function of the ISM • This transfer function is a function of the Dispersion Measure (DM) of the signal got from models of the galactic electron density • In practice we use the Fast Fourier Transform (FFT) to make the convolution operation a multiplication in the frequency domain and then apply an inverse FFT

Motivation • Why study Pulsars? • A major SKA Science driver: Detection of gravitational waves and tests of strong field relativity; Analysing black holes • GPU acceleration for MeerKAT • Large frequency range (Low: 0.5 – 2.5 GHz, High: 8 – 14.5GHz) • High bandwidth per polarisation (4GHz final) • Large number of channels (16384) • >10GB of data per second • Even more important for SKA since precision will be a high priority and data storage is not feasible

Implementation Considerations • Both CPU and GPU were tested with single-precision floating point • A bottleneck for GPU computing is the time taken to send data to it from main memory – minimise as much as possible • Use asynchronous data transfers to hide the latency • Re-calculate rather than copy data across • Use shared memory on the GPU for calculations and store to global memory at the end • Source data file used is fake dual polarisation data generated with a DM of 50pc/cm3 and 100MHz bandwidth centred on 1450MHz

Basic Program Flow HOST DEVICE Read in Data Copy to GPU memory Allocate memory on GPU Initiate GPU Kernel Begin De-dispersion Parallel FFT Parallel FFT Parallel FFT ... V(f0) . H-1(f0) V(f1) . H-1(f1) V(fn) . H-1(fn) ... Inverse FFT Inverse FFT Inverse FFT ... + + Output Array Receive de-dispersed signal Send Data Back to Host Free Memory Figure 4: Program flow

Results Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)

Results • Was able to coherently de-disperse 50MHz on 1 GPU • Used 2 GPUs for the full 100MHz • Scaling across multiple GPUs was linear • Using larger transfer functions was found to increase performance since there was less of an overhead in memory access times

Conclusion • GPUs are significantly faster than CPUs for de-dispersion • Enabled real-time coherent de-dispersion for the dataset used • Coherent de-dispersion of a 100MHz bandwidth signal requires multiple GPUs at present • Faster memory access would greatly improve overall speedup • Currently testing with real undetected pulsar data

Questions? Thank You!

References • D. R. Lorimer and M. Kramer, Handbook of Pulsar Astronomy Cambridge University Press, 2005 • NVIDIA CUDA Programming Guide • D. Manchester, “CSIRO ATNF Pulsar Education Page” • Jim Cordes, “The SKA as a Radio Synoptic Survey Telescope: Widefield Surveys for Transients, Pulsars and ETI”, SKA Memo 97 • John Rowe Animation/Australia Telescope National Facility, CSIRO [Online]. http://www.atnf.csiro.au/research/pulsar/array/gallery.html • Cornell University Dept. of Astronomy, “Legacy Pulsars: Homepage” [Online]. http://arecibo.tc.cornell.edu/legacypulsardata/Default.aspx • VR-Zone, “The NVIDIA GeForce GTX 280 1GB bare,” [Online]. http://vr-zone.com/articles/nvidia-geforce-gtx-280-preview/5872.html?doc=5872

Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units

Accelerating Coherent Pulsar De-dispersion on Graphics Processing Units

Presentation Transcript

Graphics Processing Unit

Graphics Processing Unit

Accelerating Machine Learning Applications on Graphics Processors

Genetic Programming on General Purpose Graphics Processing Units GPGPGPU

Implementation of Parallel Processing Techniques on Graphical Processing Units

High-throughput sequence alignment using Graphics Processing Units

Graphics Processing Unit

Memory Optimizations for Graphics Processing Units

Graphics Processing Unit

General Purpose Computation on Graphics Processing Units (GPGPU)

Graphics Processing Units ( GPUs )

General Purpose Graphics Processing Units (GPGPUs)

On translation units and automatic processing

Graphics Processing Unit

Using Graphics Processing Units as Accelerators for Pulsar Dedispersion

Session 1: GPU: Graphics Processing Units

Graphics Processing Units

Implementation of Parallel Processing Techniques on Graphical Processing Units

Accelerating Statistical Static Timing Analysis Using Graphics Processing Units

Implementation of Parallel Processing Techniques on Graphical Processing Units