330 likes | 460 Views
Low-Frequency Pulsar Surveys and Supercomputing. Matthew Bailes. Outline:. Baseband Instrumentation MultiBOB MWA survey vs PKSMB survey Data rates CPU times Low-Frequency Pulsar Monitoring The Future Supercomputers. Pulsar “Dedispersion”. Incoherent. Coherent Dedispersion.
E N D
Low-Frequency Pulsar Surveys and Supercomputing Matthew Bailes
Outline: • Baseband Instrumentation • MultiBOB • MWA survey vs PKSMB survey • Data rates • CPU times • Low-Frequency Pulsar Monitoring • The Future Supercomputers
Pulsar “Dedispersion” • Incoherent
Coherent Dedispersion • Unresolved on us timescales • From young or millisecond pulsars • Power-law distribution of energies PSR J0218+4232
Swinburne Baseband Recorders etc • 1998: Canadian S2 to computer (16 MHz x 2) • 100K system + video tapes • 2000: CPSR • 20 MHz x 2 + DLT7000 drives x 4 • 2002: CPSR2 • 128 MHz x 2 + real-time supercomputer (60 cores) • 2006: DiFX (Deller, Tingay, Bailes & West) • Software Correlator (ATNF adopted) • 2007: APSR • 1024 MHz x 2 + real-time supercomputer (160 cores) • 2008: MultiBOB • 13 x 1024 ch x 64us + fibre + 1600-core supercomputer
dspsr software • Mature • Delivers < 100 ns timing on selected pulsars • Total power estimation every 8us with RFI excision • Write a “loader” • Can do: • Giant pulse work • Pulsar searching (coherent filterbanks) • Pulsar timing/polarimetry • Interferometry with pulsar gating
PSRDADA (van Straten) • psrdada.sourceforge.net • Generic UDP data capture system (APSR/MultiBOB) • Ring Buffer(s) • Can attach threads to fold/dedisperse etc • Hierachical buffers • Shares available CPU resources/disk • Web-based control/monitoring • Free! + hooks to dspsr & psrchive.
APSR • Takes 8 Gb/s voltages • Forms: • 16 x 128 channels (with coherent dedispersion) • 4 Stokes, umpteen pulsars • Real-time fold to DM=250 pc/cc. • O(100) Ops/sample • Sustaining >>100 Gflops • ~100K computers. • June 2008 • 192 MHz working @ 4bits • 768 MHz working @ 2bits
Coherent Dedispersion BW/time 1024 x (100K) BW 128 (300K) x 16 20 x x 1998 2000 2002 2004 2006 2008 year
Coherent Dedispersion • Now “trivial” • FFT ease ~ B-2/3
MultiBOB • High Resolution Universe Survey (PALFA of the South) • Werthimer’s iBOB boards • 1024 channels, down to 10us sampling • Two pols • FPGA coding hard… • Use software gain equalizer/summer • ~5 MB/s beam • 1 Gb/s Fibre to Swinburne (>1000 km fibre) • Real time searching!
New PKS MB Survey: • Kramer • 13 beams • 70 minutes/pointing • 1024 channels • 300 MHz BW • 64 us sampling • +/- 3.5 deg • Bailes • 13 beams • 9 minutes/pointing • 1024 channels • 300 MHz BW • 64 us sampling • +/- 15 deg • Johnston • 13 beams • 4.5 minutes/pointing • 1024 channels • 300 MHz BW • 32 us sampling • The rest
MWA • Samples • Takes (24x1.3MHz=32 MHz) x 2 x 512 • “Just” 32 GB/s (64 Gsamples/s) • FFTs it • (5 N log2 ops/pt = 2.2 Tflops) • XMultiplies & adds • (512)*256*B*4 = 16 TMACs
32 vs 288 MHz ~3-5x PKS 700 vs 0.6 deg2 350 vs 25 K Sensitivity: (folded factor)
~ Parity PKS vs MWA • G ~ 3-5 x better • Tsys ~ 14 x worse ? • B1/2 ~ 3 x worse • Flux ~ 25 x better (1400 vs 200 MHz) • t1/2 ~ 32 x better Single Pulse work ~ Comparable Coherent search ~ 32x improvement! But: There is a limit to the time you can observe a pulsar! 4m vs 144m -> 5x deeper.
Scattering b=0 • 1,10,100,1000ms
Scattering b=5d • 1,10,50,100ms
b=30 • 0.5,1ms
? ? . . . . . . . . . . . . . . . . . . . . . Correlator Us Search instrumentation? Volts Spectra Visibilities FBanks uv 32 MHz Dedisp F X Grid 2D FFT-1 36 GB/s x 512 x 512 x 256 x 1922 x 512 36 GB/s 1024 GB/s 32 bits 600 GB/s 30 GB/s 5 bits 200 GB/s 32 bits Fold FFT Spectra Pulsars <1 bit/s
Search Timings • 36,000 “coherent beams” (768m/4m=192)2 • 36 gigapixels/s • Dedisperse/CPU core • Gigapixel/120s • 36 x 120 = 4320 cores = 500 machines = 250 kW • NFFT = 36,000 * 1024 (DMs)/8192 = 4608 FFTs/sec • Seek (3s / 8192 x 1024 pt FFT) • 14,000 cores ~ 1800 machines = MW. (M$/yr)
Supercomputing @ Swinburne The Green Machine • installed May/June 2007 • 185 Dell PowerEdge1950 nodes • 2 quad-core processors (Clovertown: Intel Xeon 64-bit 2.33 GHz) • 16GB RAM • 1TB disk -> 300 TB total • 1640 cores/14 Tflops • dual channel gigabit ethernet • CentOS Linux OS • job queue submission • 20 Gb infiniband (Q1 2008) • 83 kW .vs. 130 kW cooling Machines: ~1.2M Fuel: ~100K/yr
Search Times: • Depend only upon: • Npixels x Nchans x Tsamp-1 • Requires: • No acceleration trials • PSR J0437-4715 • In 8192s, small width from acceleration
Search Timings (32x32 tiles) • 36000->1024 “coherent beams” • 36->1 gigapixels/s • Dedisperse/core • Gigapixel/120s • 120 = 120 cores = 15 machines = 7 kW • NFFT = 1024 * 1024 (DMs)/8192(s/FFT) = 128 FFTs/sec • Seek (3s / (8192 x 1024) pt FFT) • 378 cores ~ 50 machines = 25 kW.
RRATs • Log N - Log S (helps with long pointings…) • 1000 x integration time. • Maybe good RRAT finder.
Monitoring: Monitoring?
Build Your Own Telescope? • May be cheaper to build dedicated PSR telescope than attempt to process everything from existing telescopes! • 32x32 tile: (2D FFT - 1D FFT - dedisperse - FFT) • ~2M telescopes • ~2M “beamformer/receivers” • ~1M correlator • ~1M Supercomputer • ~1M construction • ~7-8M
Next-Gen Supercomputers (IO or Tflops?) • Infiniband 20 Gb (40Gb) • 288 port switch • ~10 Tb/s IO Capacity (1-2K/node) • Teraflop CPU capacities/node (140 Gflops now) • Teraflop Server or Tflop GPU? • 10 GB/s vs 76 GB/s • Power (0.1W/$) • 2M = 200 kW
Architecture (2011??): 288 Ports 40 Gb/s 144 Tflops 288 Ports 40 Gb/s 144 Tflops FX 300K ~1M 300K ~1M
Summary: • Strong motivation for multiple (~100) tied array beams • PSRs/deg^2 • Surveys only possible with compact configurations • At present • Future Supercomputers may allow search even with MWA-like telescopes