1 / 24

TeraGyroid

TeraGyroid. HPC Applications ready for UKLight. Stephen Pickles <stephen.pickles@man.ac.uk> http://www.realitygrid.org http://www.realitygrid.org/TeraGyroid.html UKLight Town Meeting, NeSC, Edinburgh, 9/9/2004. The TeraG y r o id Project.

shad-mendez
Download Presentation

TeraGyroid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TeraGyroid HPC Applications ready for UKLight Stephen Pickles <stephen.pickles@man.ac.uk> http://www.realitygrid.org http://www.realitygrid.org/TeraGyroid.html UKLight Town Meeting, NeSC, Edinburgh, 9/9/2004

  2. The TeraGyroid Project • Funded by EPSRC (UK) & NSF (USA) to join the UK e-Science Grid and US TeraGrid • application from RealityGrid, a UK e-Science Pilot Project • 3 month project including work exhibited at SC’03 and SC Global, Nov 2003 • thumbs up from TeraGrid mid-September, funding from EPSRC approved later • Main objective was to deliver high impact science which it would not be possible to perform without the combined resources of the US and UK grids • Study of defect dynamics in liquid crystalline surfactant systems using lattice-Boltzmann methods • featured world’s largest Lattice Boltzmann simulation • 1024^3 cell simulation of gyroid phase demands terascale computing • hence “TeraGyroid” UKLight Town Meeting, NeSC, 9/9/2004

  3. Networking HPC engine HPC engine checkpoint files steering: control and status visualization data compressed video visualization engine storage UKLight Town Meeting, NeSC, 9/9/2004

  4. LB3D code is written in Fortran90 and parallelized using MPI Scales linearly on all available resources (Lemieux, HPCx, CSAR, Linux/Itanium II clusters) Data produced during a single run can exceed 100s of gigabytes to terabytes Simulations require supercomputers High end visualization hardware (eg. SGI Onyx, dedicated viz clusters) and parallel rendering software (e.g. VTK) needed for data analysis LB3D: 3-dimensional Lattice-Boltzmann simulations 3D datasets showing snapshots from a simulation of spinodal decomposition: A binary mixture of water and oil phase separates. ‘Blue’ areas denote high water densities and ‘red’ visualizes the interface between both fluids. UKLight Town Meeting, NeSC, 9/9/2004

  5. Computational Steering ofLattice Boltzmann Simulations • LB3D instrumented for steering using the RealityGrid steering library. • Malleable checkpoint/restart functionality allows ‘rewinding’ of simulations and run-time job migration across architectures. • Steering reduces storage requirements because the user can adapt data dumping frequencies. • CPU time can be saved because users do not have to wait for jobs to be finished if they can already see that nothing relevant is happening. • Instead of doing “task farming”, parameter searches are accelerated by “steering” through parameter space. • Analysis time is significantly reduced because less irrelevant data is produced. Applied to study of gyroid mesophase of amphiphilic liquid crystals at unprecedented space and time scales UKLight Town Meeting, NeSC, 9/9/2004

  6. Parameter space exploration Cubic micellar phase, high surfactant density gradient. Cubic micellar phase, low surfactant density gradient. Initial condition: Random water/ surfactant mixture. Self-assembly starts. Lamellar phase: surfactant bilayers between water layers. Rewind and restart from checkpoint. UKLight Town Meeting, NeSC, 9/9/2004

  7. Strategy • Aim: use federated resources of US TeraGrid and UK e-Science Grid to accelerate scientific process • Rapidly map out parameter space using large number of independent “small” (128^3) simulations • use job cloning and migration to exploit available resources and save equilibration time • Monitor their behaviour using on-line visualization • Hence identify parameters for high-resolution simulations on HPCx and Lemieux • 1024^3 on Lemieux (PSC) – takes 0.5 TB to checkpoint! • create initial conditions by stacking smaller simulations with periodic boundary conditions • Selected 128^3 simulations were used for long-time studies • All simulations monitored and steered by geographically distributed team of computational scientists UKLight Town Meeting, NeSC, 9/9/2004

  8. components start independently and attach/detach dynamically Simulation Steering GS bind Steering library Steering library Steering library Steering library publish Client connect data transfer (Globus-IO) Steering client Registry find publish bind Display Display Display Visualization Visualization Steering GS The Architecture of Steering OGSI middle tier multiple clients: Qt/C++, .NET on PocketPC, GridSphere Portlet (Java) remote visualization through SGI VizServer, Chromium, and/or streamed to Access Grid • Computations run at HPCx, CSAR, SDSC, PSC and NCSA • Visualizations run at Manchester, UCL, Argonne, NCSA, Phoenix • Scientists in 4 sites steer calculations, collaborating via Access Grid • Visualizations viewed remotely • Grid services run anywhere UKLight Town Meeting, NeSC, 9/9/2004

  9. SC Global ’03 Demonstration UKLight Town Meeting, NeSC, 9/9/2004

  10. TeraGyroid Testbed Starlight (Chicago) Netherlight (Amsterdam) 10 Gbps ANL PSC Manchester Caltech NCSA Daresbury BT provision 2 x 1 Gbps production network MB-NG SJ4 SDSC Phoenix Visualization UCL Access Grid node Computation Service Registry Network PoP Dual-homed system UKLight Town Meeting, NeSC, 9/9/2004

  11. Trans-AtlanticNetwork Collaborators: • Manchester Computing • Daresbury Laboratory Networking Group • MB-NG and UKERNA • UCL Computing Service • BT • SurfNET (NL) • Starlight (US) • Internet-2 (US) UKLight Town Meeting, NeSC, 9/9/2004

  12. TeraGyroid:Hardware Infrastructure Computation (using more than 6000 processors) including: • HPCx (Daresbury), 1280 procs IBM Power4 Regatta, 6.6 Tflops peak, 1.024 TB • Lemieux (PSC), 3000 procs HP/Compaq, 3TB memory, 6 Tflops peak • TeraGrid Itanium2 cluster (NCSA), 256 procs, 1.3 Tflops peak • TeraGrid Itanium2 cluster (SDSC), 256 procs, 1.3 Tflops peak • Green (CSAR), SGI Origin 3800, 512 procs, 0.512 TB memory (shared) • Newton (CSAR), SGI Altix 3700, 256 Itanium 2 procs, 384GB memory (shared) Visualization: • Bezier (Manchester), SGI Onyx 300, 6xIR3, 32procs • Dirac (UCL), SGI Onyx 2, 2xIR3, 16 procs • SGI loan machine, Phoenix, SGI Onyx 1xIR4, 1xIR3, commissioned on site • TeraGrid Visualization Cluster (ANL), Intel Xeon • SGI Onyx (NCSA) Service Registry: • Frik (Manchester), Sony Playstation2 Storage: • 20 TB of science data generated in project • 2 TB moved to long term storage for on-going analysis - Atlas Petabyte Storage System (RAL) Access Grid nodes at Boston University, UCL, Manchester, Martlesham, Phoenix (4) UKLight Town Meeting, NeSC, 9/9/2004

  13. Network lessons • Less than three weeks to debug networks • applications people and network people nodded wisely but didn’t understand each other • middleware such as GridFTP is infrastructure to applications folk, but an application to network folk • rapprochement necessary for success • Grid middleware not designed with dual-homed systems in mind • HPCx, CSAR (Green) and Bezier are busy production systems • had to be dual homed on SJ4 and MB-NG • great care with routing • complication: we needed to drive everything from laptops that couldn’t see the MB-NG network • Many other problems encountered • but nothing that can’t be fixed once and for all given persistent infrastructure UKLight Town Meeting, NeSC, 9/9/2004

  14. Measured Transatlantic Bandwidths during SC’03 UKLight Town Meeting, NeSC, 9/9/2004

  15. TeraGyroid: Summary • Real computational science... • Gyroid mesophase of amphiphilic liquid crystals • Unprecedented space and time scales • investigating phenomena previously out of reach • ...on real Grids... • enabled by high-bandwidth networks • ...to reduce time to insight Dislocations Interfacial Surfactant Density UKLight Town Meeting, NeSC, 9/9/2004

  16. TeraGyroid: Collaborating Organisations Our thanks to hundreds of individuals at:... Argonne National Laboratory (ANL) Boston University BT BT Exact Caltech CSC Computing Services for Academic Research (CSAR) CCLRC Daresbury Laboratory Department of Trade and Industry (DTI) Edinburgh Parallel Computing Centre Engineering and Physical Sciences Research Council (EPSRC) Forschungzentrum Juelich HLRS (Stuttgart) HPCx IBM Imperial College London National Center for Supercomputer Applications (NCSA) Pittsburgh Supercomputer Center San Diego Supercomputer Center SCinet SGI SURFnet TeraGrid Tufts University, Boston UKERNA UK Grid Support Centre University College London University of Edinburgh University of Manchester ANL UKLight Town Meeting, NeSC, 9/9/2004

  17. The TeraGyroid Experiment S. M. Pickles1, R. J. Blake2, B. M. Boghosian3, J. M. Brooke1, J. Chin4, P. E. L. Clarke5, P. V. Coveney4, N. González-Segredo4, R. Haines1, J. Harting4, M. Harvey4, M. A. S. Jones1, M. Mc Keown1, R. L. Pinning1, A. R. Porter1, K. Roy1, and M. Riding1. Manchester Computing, University of Manchester CLRC Daresbury Laboratory, Daresbury Tufts University, Massachusetts Centre for Computational Science, University College London Department of Physics & Astronomy, University College London http://www.realitygrid.org http://www.realitygrid.org/TeraGyroid.html

  18. New Application at AHM2004 “Exact” calculation of peptide-protein binding energies by steered thermodynamic integration using high-performance computing grids. Philip Fowler, Peter Coveney, Shantenu Jha and Shunzhou Wan UK e-Science All Hands Meeting 31 August – 3 September 2004

  19. Why are we studying this system? • Measuring binding energies are vital for e.g. designing new drugs. • Calculating a peptide-protein binding energy can take weeks to months. • We have developed a grid-based method to accelerate this process To computeGbindduring the AHM 2004 conference i.e. in less than 48 hours Using federated resources of UK National Grid Service and US TeraGrid UKLight Town Meeting, NeSC, 9/9/2004

  20. t Thermodynamic Integration on Computational Grids Use steering to launch, spawn and terminate - jobs Starting conformation Check for convergence Combine and calculate integral =0.1 time =0.2 =0.3 lambda Seed successive simulations (10 sims, each 2ns) … =0.9 Run each independent job on the Grid UKLight Town Meeting, NeSC, 9/9/2004

  21. checkpointing steering and control monitoring UKLight Town Meeting, NeSC, 9/9/2004

  22. We successfully ran many simulations… • This is the first time we have completed an entire calculation. • Insight gained will help us improve the throughput. • The simulations were started at 5pm on Tuesday and the data was collated at 10am Thursday. • 26 simulations were run • At 4.30pm on Wednesday, we had nine simulations in progress (140 processors) • 1x TG-SDSC, 3x TG-NCSA, 3x NGS-Oxford, 1x NGS-Leeds, 1x NGS-RAL • We simulated over 6.8ns of classical molecular dynamics in this time UKLight Town Meeting, NeSC, 9/9/2004

  23. Very preliminary results G (kcal/mol) Experiment -1.0 ± 0.3 “Quick and dirty” analysis* -9 to -12 * - as at 41 hours We expect our value to improve with further analysis around the endpoints. UKLight Town Meeting, NeSC, 9/9/2004

  24. Conclusions • We can harness today’s grids to accelerate high-end computational science • On-line visualization and job migration require high bandwidth networks • Need persistent network infrastructure • else set up costs are too high • QoS: Would like ability to reserve bandwidth • and processors, graphics pipes, AG rooms, virtual venues, nodops... (but that’s another story) • Hence our interest in UKLight UKLight Town Meeting, NeSC, 9/9/2004

More Related