Radhika S. Saksena 1 , Bruce Boghosian 2 , Luis Fazendeiro 1 , Owain A. Kenway, Steven Manos 1 , Marco Mazzeo 1 , S. Kashif Sadik 1 , James L. Suter 1 , David Wright 1 and Peter V. Coveney 1 1. Centre for Computational Science, UCL, UK 2. Tufts University, Boston, USA.
Contents • New era of petascale resources • Scientific applications at petascale: • Unstable periodic orbits in turbulence • Liquid crystalline rheology • Clay-polymer nanocomposites • HIV drug resistance • Patient specific haemodynamics • Conclusions
New era of petascale machines • Ranger (TACC) - NSF funded SUN Cluster • 0.58 petaflops (theoretical) peak: ~ 10 times HECToR (59 Tflops) “bigger” than all other TeraGrid resources combined • Linpack speed 0.31 petaflops, 123TB memory • Architecture: 82 racks; 1 rack = 4 chassis; 1 chassis = 12 nodes • 1 node = Sun blade x6420 (four 16 bit AMD Opteron Quad-Core processors); • 3,936 nodes = 62,976 cores • Intrepid (ALCF) - DOE funded BlueGene/P • 0.56 petaflops (theoretical) peak • 163,840 cores; 80TB memory • Linpack speed 0.45 petaflops • “Fastest” machine available for open science and third in general1 • 1. http://www.top500.org/lists/2008/06
New era of petascale machines • US firmly committed to path to petascale (and beyond) • NSF: Ranger (5 years, $59 million award) • University of Tennessee, to build system with just under 1PF peak performance ($65 million, 5-year project)1 • “Blue Waters” will come online in 2011 at NCSA ($208 grant), using IBM technology – to deliver peak 10 Pflops performance (~ 200K cores, 10PB of disk) • 1. http://www.nsf.gov/news/news_summ.jsp?cntn_id=109850
New era of petascale machines • We wish to do new science at this scale – not just incremental • advances • Applications that scale linearly up to tens of thousands of cores • (large system sizes, many time steps) – capability computing at • petascale • High throughput for “intermediate scale” applications • (in the 128 – 512 core range)
HPCx Leeds Manchester Oxford RAL Intercontinental HPC grid environment UK NGS US TeraGrid HECToR NCSA AHE SDSC PSC TACC (Ranger) ANL (Intrepid) DEISA Lightpaths • Massive data transfers • Advanced reservation/ co-scheduling • Emergency/pre-emptive access
Lightpaths - Dedicated 1 Gb UK/US network • JANET Lightpath is a centrally managed service which supports large research projects on the JANET network by providing end-to-end connectivity, from 100’s of Mb up to whole fibre wavelengths (10 Gb). • Typical usage • Dedicated 1Gb network to connect to • national and international HPC infrastructure • Shifting TB datasets between the UK/US • Real-time visualisation • Interactive computational steering • Cross-site MPI runs (e.g. between • NGS2 Manchester and NGS2 Oxford)
Advanced reservations Plan in advance to have access to the resources Process of reserving multiple resources for use by a single application - HARC1 - Highly Available Resource Co-Allocator - GUR2 - Grid Universal Remote Can reserve the resources: For the same time: Distributed MPIg/MPICH-G2 jobs Distributed visualization Booking equipment (e.g. visualization facilities) Or some coordinated set of times Computational workflows Urgent computing and pre-emptive access (SPRUCE)1.http://www.realitygrid.org/middleware.shtml#HARC 2. http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/TGIA64LinuxCluster/Doc/coschedule.html
Advanced reservations • Also available via the HARC API - can be easily built into Java applications. • Deployed on a number of systems - LONI (ducky, bluedawg, zeke, neptune IBM p5 clusters) - TeraGrid (NCSA, SDSC IA64 clusters, Lonestar, Ranger(?)) - HPCx - North West Grid (UK) - National Grid Service - UK NGS - Manchester, Oxford, Leeds
Middleware which simplifies access to distributed resources; manage workflows Wrestling with middleware can't be a limiting step for scientists - Hiding complexities of the ‘grid’ from the end user Applications are stateful Web services Application can consist of a coupled model, parameter sweep, steerable application, or a single executable Application Hosting Environment
HYPO4D1 (Hydrodynamic periodic orbits in 4D) • Scientific goal: to identify and characterize periodic orbits in turbulent fluid flow (from which exact time averages can be computed exactly) • Uses lattice-Boltzmann method: highly scalable (linear scaling up to at least 33K cores on Intrepid and close to linear up to 65K) a) Ranger b) Intrepid + Surveyor (Blue Gene/P) 1. L. Fazendeiro et al. “A novel computational approach to turbulence”, AHM08
HYPO4D1 (Hydrodynamic periodic orbits in 4D) • Novel approach to turbulence studies: efficiently parallelizes time and space • Algorithm is extremely memory-intensive: full spacetime trajectories are numerically relaxed to nearby minimum (unstable periodic orbit) • Ranger is ideal resource for this work (123 TB of RAM) • During early-user period millions • of time steps for different • systems simulated and • then compared for similarities • ~ 9TB of data 1. L. Fazendeiro et al. “A novel computational approach to turbulence”, AHM08
LB3D1 • LB3D -- three-dimensional lattice-Boltzmann solver for multi-component fluid dynamics, in particular amphiphilic systems • Mature code - 9 years in development. It has been extensively used on • the US TeraGrid, UK NGS, HECToR and HPCx machines • Largest model simulated to date is 20483 (needs Ranger) R. S. Saksena et al. “Petascale lattice-Boltzmann simulations of amphiphilic liquid crystals”, AHM08
Cubic Phase Rheology Results1 • Recent results include the tracking of large time-scale defect dynamics on 10243 lattice-sites systems; only possible on Ranger, due to sustained core count and disk storage requirements • Regions of high stress • magnitude are localized in the • vicinity of defects 2563 lattice-sites gyroidal system with multiple domains 1. R. S. Saksena et al. “Petascale lattice-Boltzmann simulations of amphiphilic liquid crystals”, AHM08
LAMMPS1 • Fully-atomistic simulations of clay-polymer nanocomposites • on Ranger • More than 85 million atoms • simulated • Clay mineral studies, with • ~ 3 million atoms, 2-3 orders • of magnitude greater than any • previous study • Prospects: to include the edges of the clay (not periodic boundary) and do realistic-sized models – at least 100 million atoms (~2 weeks wall clock, using 4096 cores) 1. J Suter et al. Grid-Enabled Large-Scale Molecular Dynamics of Clay Nano-materials, AHM08
HIV-1 drug resistance1 • Goal: to study the effect of anti- retroviral inhibitors (targetting proteins in the HIV lifecycle, such as viral protease and reverse- transcriptase enzymes) • High end computational power to confer clinical decision support • On Ranger, up to 100 replicas (configurations) simulated, for the first time, in some cases going to 100 ns • 3.5TB of trajectory and free energy analysis Energy differences of binding compared with experimental results for wildtype and MDR proteases with inhibitors LPV and RTV using 10ns trajectory. • 6 microseconds in four weeks • AHE orchestrated workflows 1. K. Sadiq et al., “Rapid, Accurate and Automated Binding Free Energy Calculations of Ligand-Bound HIV Enzymes for Clinical Decision Support using HPC and Grid Resources”, AHM08
GENIUS project1 • Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS) • Scientific goal: to perform real time patient specific medical simulation • Combines blood flow simulation with clinical data • Fitting the computational time scale • to the clinical time scale: • Capture the clinical workflow • Get results which will influence clinical decisions: 1 day? 1 week? • GENIUS - 15 to 30 minutes 1. S. Manos et al., “Surgical Treatment for Neurovascular Pathologies Using Patient-specific Whole Cerebral Blood Flow Simulation”, AHM08
GENIUS project1 • Blood flow is simulated using lattice-Boltzmann method (HemeLB) • Parallel ray tracer doing real time in situ visualization • Sub-frames rendered on each MPI processor/rank and composited before • being sent over the network to a (lightweight) viewing client • Addition of volume rendering cuts down scalability of fluid solver due to • required global communications • Even so, datasets rendered at more than 30 frames per second (10242 • pixel resolution) 1. S. Manos et al., “Surgical Treatment for Neurovascular Pathologies Using Patient-specific Whole Cerebral Blood Flow Simulation”, AHM08
CONCLUSIONS • A wide range of scientific research activities were presented that make • effective use of the new range of petascale resources available in the USA • These demonstrate the emergence of new science not possible without • access to this scale of resources • Some existing techniques still hold however, such as MPI, as some of • these applications have shown, scaling linearly up to at least tens of • thousands of cores • Future prospects: we are well placed to move onto next machines coming • online in the US and Japan
Acknowledgements JANET/David Salmon NGS staff TeraGrid Staff Simon Clifford (CCS) Jay Bousseau (TACC) Lucas Wilson (TACC) Pete Beckmann (ANL) Ramesh Balakrishnan (ANL) Brian Toonen (ANL) Prof. Nicholas Karonis (ANL)