Accelerating a Software Radio Astronomy Correlator

By Andrew Woods Supervisor: Prof. Inggs & Dr Langman Accelerating a Software Radio Astronomy Correlator

Correlator • Radio Telescopes have many separate antennas • Use correlator to combine them to produce high resolution images • Do this by correlating • Frequency domain better for large inputs

FPGA • Used 2x Nallatech H101 Board • Has V4LX100, PCI-X interface, 16MB SRAM and 512MB DDR2 • Used Dime-C tools, which is a C like language to program. Aimed at software acceleration • -, FPGA achieved clock rates around 100MHz • +, can create custom hardware for application. • Parallel execution • Pipeline. HPRC Card

GPUs • Processing monsters • Achieved by using little cache and control • Used to be fixed functions. Recently programable. • People started using pixel shaders for GPP. • Nvidia have released CUDA, a language specifically for GP. • Used Nvidia 8800 GT • 112 pixel shaders @ 1.5GHz

FX Correlator • Each antenna 3 Steps, FFT and then the multiplication with every other antenna and then integrated • The Multiplication being the dominant area of computation was the function implemented on FPGA and GPU

Correlation Graphically[1] …… Freq 0 Freq M N^2/2 N^2/2 x int length N^2/2 x int length x Freq

FPGA Design • We were able to implement 96 floating point units. • Created pipelined engine that computes single output for three time steps and integrates • Could fit four of these engines so could compute for four frequencies at a time • Getting speedup ~ 3x vs. 3GHz Xeon (SSE). • Getting ~ 85% theoretical peak (excluding transfers). Freq 1 Freq 2 Freq 3 Freq 0 Clock cycle N2/2 Clock cycle 0 Clock cycle 1

GPU Design[1] • Works on thread parallelism. Each executes on a pixel shader. • Cuda uses light weight threads. • Created thread for each output (+ redundant ones) then integrated. • Getting speedup ~ 5x vs. 3GHz Xeon (SSE).

Findings • The GPUs vs Nallatech FPGA • GPU required considerably less effort, • Performed better, • Much cheaper ~20x • Still a lot of areas to squeeze out more performance. (Chris Harris). • In defense of FPGAs • Virtex 5 can achieve higher clock rate (up to 500MHz) • 96 multipliers on V4LX100 is not enough, V5SX240 has 1,056 • About 25% of the time was spent on transfers via older PCI-X bus. • More power efficient

References • [1] Chris Harris et al,The University of Western Australia (UWA), GPU Accelerated Radio Astronomy Signal Convolution, published in Experimental Astronomy, 2008

Questions

Accelerating a Software Radio Astronomy Correlator

Accelerating a Software Radio Astronomy Correlator

Presentation Transcript

Radio Astronomy: Jansky

USNO Software Correlator: Status Report

Exascale radio astronomy

Radio Astronomy

Metamaterials for radio astronomy

Radio Astronomy

Radio Astronomy

Radio Astronomy

Data processing software for radio astronomy

Radio Astronomy

The DiFX software correlator

Radio Astronomy

(Radio) Astronomy in Taiwan

WIDAR Correlator Board Component Software

Radio Astronomy

Radio Astronomy in School

ARECIBO RADIO ASTRONOMY

Radio Astronomy and Amateur Radio

Radio Astronomy Outreach

Radio Astronomy

A lecture on Radio Astronomy

Data processing software for radio astronomy