140 likes | 161 Views
Explore GPU replacement in ATCA correlator for improved flexibility, reliability, and support. Utilize ASKAP components for filterbank and cross-correlation. Upgrade samplers and switch to enhance data processing efficiency.
E N D
ATCA GPU Correlator Strawman Design Chris Phillips | LBA Lead Scientist 17 November 2015 Astronomy and space science
ATCA • Current CABB Backend • 6 antenna • 8 GHz analog IF, dual pol • 2x2 GHz IF, dual pol • 11 bit samplers, FPGA backend • 2048 channel/1 MHz continuum mode • Various spectral line modes with “Zoom Bands” • E.g. 4 MHz/2048 channels, 64 MHz/2048 channels • Many designed modes never implemented • Hardware unreliable and difficult to change modes • No full bandwidth tied array mode DIFX 2015 – Chris Phillips
GPU replacement? Could we improve flexibility, reliability and long term support with GPU backend New samplers, which talk directly to optical fibre Use ASKAP “redback” boards for coarse filterbank(128 MHz) and Ethernet packetisation (10 GbE) Fine filterbank and cross correlation in GPU - Ethernet cross connect DIFX 2015 – Chris Phillips
Samplers • 12 bit Texas Instrument ADC12J4000 • Transport JESD204B • No FPGA required at sampler, optical interface • 8 lanes @ 8 Gb/s DIFX 2015 – Chris Phillips
Redback • ASKAP Beamformer/correlator board • 6 Xilinx Kintex-420 Series-7 FPGAs • 1U chassis • 36x10 Gbps Ethernet output • 2x2 GHz IF per redback • 2 redback/telescope • Coarse filterbank – 128 MHz • Need to divide data 16 ways • Re-quantize to 8 bits to reduce i/o load • 16 bits would be better, but doubles backend cost • 12 bit may have minimal cost overhead DIFX 2015 – Chris Phillips
Switch • Commodity 40 Gbps Ethernet switch • 64 port 40 GbE ~$30K • Can run 4x10 GbE per 40 GbE port • 8 bit system requires 56x40 GbE ports DIFX 2015 – Chris Phillips
GPU backend • Need to frequency slice data at least 12 ways to avoid bottleneck on ingest (6 antenna, 2 pols) • Use factor of 16 • 16 GPUs per IF, dedicated 40 GbE Ethernet per GPU • 2 GPU/host plus 2 40 GbE NICs • Don’t implement zoom bands – filter data to highest required spectral resolution then frequency average as appropriate DIFX 2015 – Chris Phillips
Costs: Assumptions ($A) DIFX 2015 – Chris Phillips
Costs: Total DIFX 2015 – Chris Phillips
Supported Modes • Assuming GPUs have enough computational power basic interferometry modes relatively easy to implement • High spectral resolution, short integration times • Tied array – multiple simultaneous beams • i/o issues (but new GPU 2 copy engines and NIC Tx relatively empty) • Pulsar binning • Need to also extract noisecal DIFX 2015 – Chris Phillips
Exotic Modes • Pulsar Coherence de-dispersion or binning • Just requires extra compute • Fast radio bursts – serendipitous • Requires ~500 kHz spectral resolution, 64usec time resolution • 246 Gbps visibility output (8 Gbps/node) • Not really viable, need to detect on 128 MHz data • Longer integration, lower spectral resolution? • Transient buffer mode, external triggers • 3 Gbytes/sec incoming rate • Only buffer a few seconds – not viable without major cost (RAM) DIFX 2015 – Chris Phillips
Exotic Modes (cont) • Nanosecond pulse detection (Lunaska) • 2 GHz bandwidth gives 0.25 nsecsampling time • But need full bandwidth in one location • Could received 128 MHz channels on servers then recirculate in round-robin fashion. • Not enough bandwidth to receive second copy of data without second NIC and more switching capability (dedicated infiniband?) • Changed redback mode to buffer data, then round-robin full 2 GHz data to GPUs – need special redback FPGA firmware • De-disperse antenna data, form tied array beams and look for pulses • Can dump voltage data if candidate found DIFX 2015 – Chris Phillips
Thank you Astronomy and Space Science Dr Chris PhillipsLBA Lead Scientist t +61 2 93724608 eChris.Phillips@csiro.au wwww.atnf.csiro.au Astronomy and space science