240 likes | 372 Views
Big Bang, Big Iron High Performance Computing and the Cosmic Microwave Background. Julian Borrill Computational Cosmology Center, LBL Space Sciences Laboratory, UCB with Chris Cantalupo, Ted Kisner, Radek Stompor, Rajesh Sudarsan and the BOOMERanG, MAXIMA, Planck,
E N D
Big Bang, Big IronHigh Performance Computing and the Cosmic Microwave Background Julian Borrill Computational Cosmology Center, LBL Space Sciences Laboratory, UCB with Chris Cantalupo, Ted Kisner, Radek Stompor, Rajesh Sudarsan and the BOOMERanG, MAXIMA, Planck, EBEX, PolarBear & other experimental collaborations
The Cosmic Microwave Background About 400,000 years after the Big Bang, the expanding Universe cools through the ionization temperature of hydrogen: p+ + e- => H. Without free electrons to scatter off, CMB photons free-stream to us today. • COSMIC - filling all of space. • MICROWAVE - redshifted by the expansion of the Universe from 3000K to 3K. • BACKGROUND - primordial photons coming from “behind” all astrophysical sources.
CMB Physics Drivers • It is the earliest possible photon image of the Universe. • Its existence supports a Big Bang over a Steady State cosmology (NP1). • Tiny fluctuations in the CMB temperature (NP2) and polarization encode details of • cosmology • geometry • topology • composition • history • ultra-high energy physics • fundamental forces • beyond the standard model • Inflation & the dark sector (NP3)
The Concordance Cosmology Supernova Cosmology Project (1998): Cosmic Dynamics (- m) BOOMERanG & MAXIMA (2000): Cosmic Geometry (+ m) 70% Dark Energy + 25% Dark Matter + 5% Baryons 95% Ignorance What (and why) is the Dark Universe ?
Observing The CMB 1% of static on (untuned) TV
The Planck Satellite • The primary driver for HPC CMB work for the last decade. • A joint ESA/NASA satellite mission performing a 2-year+ all-sky survey from L2. • All-sky survey at 9 microwave frequencies from 30 to 857 GHz. • The biggest data set to date: • O(1012) observations • O(108) sky pixels • O(104) spectral multipoles
Beyond Planck • EBEX (1x Planck) - Antarctic long-duration balloon flight in 2012. • PolarBear (10x Planck) - Atacama desert ground-based 2010-13. • QUIET-II (100x Planck) - Atacama desert ground-based 2012-15. • CMBpol (1000x Planck) - L2 satellite 2020-ish?
CMB Data Analysis • In principle very simple • Assume Guassianity and maximize the likelihood • of maps given the data and its noise statistics (analytic). • of power spectra given maps and their noise statistics (iterative). • In practice very complex • Foregrounds, asymmetric beams, non-Gaussian noise, etc. • Algorithm & implementation scaling with evolution of • Data volume • HPC architecture
The CMB Data Challenge • Extracting fainter signals (polarization mode, angular resolution) from the data requires: • larger data volumes to provide higher signal-to-noise. • more complex analyses to remove fainter systematic effects. • 1000x data increase over next 15 years • need to continue to scale on the bleeding edge through the next 10 M-foldings !
CMB Data Analysis Evolution Data volume & computational capability dictate analysis approach.
Scaling In Practice 2000: BOOMERanG-98 temperature map (108 samples, 105 pixels) calculated on 128 Cray T3E processors; 2005: A single-frequency Planck temperature map (1010 samples, 108 pixels) calculated on 6000 IBM SP3 processors; 2008: EBEX temperature and polarization maps (1011 samples, 106 pixels) calculated on 15360 Cray XT4 cores.
Aside: HPC System Evaluation • Scientific applications provide realistic benchmarks • Exercise all components of a system both individually and collectively. • Performance evaluation can be fed back into application codes. • MADbench2 • Based on MADspec CMB power spectrum estimation code. • Full computational complexity (calculation, communication & I/O). • Scientific complexity removed • reduces lines of code by 90%. • runs on self-generated pseudo-data. • Used for NERSC-5 & -6 procurements. • First friendly-user Franklin system crash (90 minutes after access).
MADbench2 I/O Evaluation • IO performance comparison • 6 HPC systems • Read & write • Unique & shared files • Asynchronous IO experiment • N bytes asynchronous read/write • N flops simultaneous work • Measure time spent waiting on IO
MADmap for Planck Map Making A massively parallel, highly optimized, PCG solver for maximum likelihood maps(s) given a time-stream of observations and their noise statistics • 2005: First Planck-scale map • 75 billion observations mapped to 150 million pixels • First science code to use all 6,000 CPUs of Seaborg • 2007: First full Planck map-set (FFP) • 750 billion observations mapped to 150 million pixels • Using 16,000 cores of Franklin • IO doesn’t scale • write-dominated simulations • read-dominated mappings • May 14th 2009: Planck launches!
Planck Sim/Map Target • By the end of the Planck mission in 2013, we need to be able to simulate and map • O(104) realizations of the entire mission • 74 detectors x 2.5 years ~ O(1016) samples • On O(105) cores • In O(10) wall-clock hours WAIT ~ 1 day : COST ~ 106 CPU-hrs
TARGET: 104 maps 9 freqs 2.5 years 105 cores 10 hours CTP3 FFP1 M3/GCP OTFS Peta-Scaling 12x217
On-The-Fly Simulation • Remove redundant & non-scaling IO from traditional simulate/write then read/map cycle. • M3-enabled map-maker’s read TOD request translated into runtime (on-the-fly) simulation. • Trades cycles & memory for disk & IO. • Currently supports • Piecewise stationary noise with arbitrary spectra • Truly independent random numbers • Symmetric & asymmetric beam-smoothed sky • Can be combined with explicit TOD (e.g. systematic).
Current Planck State-Of-The-Art : CTP3 • 1000 each Planck 1-year 1-frequency noise & signal. • Noise sim/map runs • O(1014) samples, 2TB disk (maps) • 2 hours on 20,000 cores
Next Generation HPC Systems • Large, correlated, CMB data sets exercise all components of HPC systems: • Data volume => disk space, IO, floating point operations. • Data correlation => memory, communication. • Different components scale differently over time and across HPC systems • Develop trade-offs and tune to system/concurrency. • IO bottleneck has been ubiquitous • now largely solved by replacement with (re-)calculation. • For all-sky surveys, communication bottleneck is the current challenge • Even in a perfect world, all-to-all communication won’t scale • Challenges & opportunities of next-generation systems: • Multi- & many-core (memory-per-core fork) • GPUs/accelerators (heterogeneous systems)
Ongoing Research • Address emerging bottlenecks at very high concurrency in the age of many-core & accelerators: • Hopper • Blue Waters • NERSC-7 • Communication: replace inter-core with inter-node communication. • Calculation: re-write/tool/compile for GPUs et al. • Auto-tuning: system- & analysis-specific run-time configuration.
Conclusions • Roughly 95% of the Universe is known to be unknown • the CMB provides a unique window onto the early years. • The CMB data sets we gather and the HPC systems we analyze them on are both evolving. • CMB data analysis is a long-term computationally-challenging problem requiring state-of-the-art HPC capabilities. • The quality of the science derived from present and future CMB data sets will be determined by the limits on • our computational capability • our ability to exploit it • CMB analysis codes can be powerful full-system evaluation tools. We’re always very interested in hiring good computational scientists to do very interesting computational science!