1 / 11

Accelerating a Software Radio Astronomy Correlator

By Andrew Woods Supervisor: Prof. Inggs & Dr Langman. Accelerating a Software Radio Astronomy Correlator. Correlator. Radio Telescopes have many separate antennas Use correlator to combine them to produce high resolution images Do this by correlating

albina
Download Presentation

Accelerating a Software Radio Astronomy Correlator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By Andrew Woods Supervisor: Prof. Inggs & Dr Langman Accelerating a Software Radio Astronomy Correlator

  2. Correlator • Radio Telescopes have many separate antennas • Use correlator to combine them to produce high resolution images • Do this by correlating • Frequency domain better for large inputs

  3. FPGA • Used 2x Nallatech H101 Board • Has V4LX100, PCI-X interface, 16MB SRAM and 512MB DDR2 • Used Dime-C tools, which is a C like language to program. Aimed at software acceleration • -, FPGA achieved clock rates around 100MHz • +, can create custom hardware for application. • Parallel execution • Pipeline. HPRC Card

  4. GPUs • Processing monsters • Achieved by using little cache and control • Used to be fixed functions. Recently programable. • People started using pixel shaders for GPP. • Nvidia have released CUDA, a language specifically for GP. • Used Nvidia 8800 GT • 112 pixel shaders @ 1.5GHz

  5. FX Correlator • Each antenna 3 Steps, FFT and then the multiplication with every other antenna and then integrated • The Multiplication being the dominant area of computation was the function implemented on FPGA and GPU

  6. Correlation Graphically[1] …… Freq 0 Freq M N^2/2 N^2/2 x int length N^2/2 x int length x Freq

  7. FPGA Design • We were able to implement 96 floating point units. • Created pipelined engine that computes single output for three time steps and integrates • Could fit four of these engines so could compute for four frequencies at a time • Getting speedup ~ 3x vs. 3GHz Xeon (SSE). • Getting ~ 85% theoretical peak (excluding transfers). Freq 1 Freq 2 Freq 3 Freq 0 Clock cycle N2/2 Clock cycle 0 Clock cycle 1

  8. GPU Design[1] • Works on thread parallelism. Each executes on a pixel shader. • Cuda uses light weight threads. • Created thread for each output (+ redundant ones) then integrated. • Getting speedup ~ 5x vs. 3GHz Xeon (SSE).

  9. Findings • The GPUs vs Nallatech FPGA • GPU required considerably less effort, • Performed better, • Much cheaper ~20x • Still a lot of areas to squeeze out more performance. (Chris Harris). • In defense of FPGAs • Virtex 5 can achieve higher clock rate (up to 500MHz) • 96 multipliers on V4LX100 is not enough, V5SX240 has 1,056 • About 25% of the time was spent on transfers via older PCI-X bus. • More power efficient

  10. References • [1] Chris Harris et al,The University of Western Australia (UWA), GPU Accelerated Radio Astronomy Signal Convolution, published in Experimental Astronomy, 2008

  11. Questions

More Related