250 likes | 366 Views
The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury Laboratory
E N D
The TeraGyroid Project - Aims and Achievements Richard Blake Computational Science and Engineering Department CCLRC Daresbury Laboratory This ambitious project was the result of an international collaboration linking the USA’s TeraGrid and the UK’s e-Science Grid, jointly funded by NSF and EPSRC. Trans-Atlantic optical bandwidth is supported by British Telecommunications.
Overview • Project Objectives • The TeraGyroid scientific experiment • Testbed and Partners • Applications Porting and RealityGrid Environment • Grid Software Infrastructure • Visualization • Networking • What was done • Project Objectives - How well did we do? • Lesson Learned
UK-Teragrid HPC Project Objectives Joint experiment combining high-end computational facilities in the UK e-Science Grid (HPCx and CSAR) and the Teragrid sites: • world class computational science experiment • enhanced expertise/ experience to benefit UK and USA • inform construction/operation of national/ international grids • stimulate long-term strategic technical collaboration • support long-term scientific collaborations • experiments with clear scientific deliverables • choice of applications to be based on community codes • inform future programme of complementary experiments
The TeraGyroid Scientific Experiment High-density isosurface of the late-time configuration in a ternary amphiphilic fluid as simulated on a 643 lattice by LB3D. Gyroid ordering coexists with defect-rich, sponge-like regions. The dynamical behaviour of such defect-rich systems can only be studied with very large scale simulations, in conjunction with high-performance visualisation and computational steering.
The RealityGrid project Mission: “Using Grid technology to closely couple high performance computing, high throughput experiment and visualization, RealityGrid will move the bottleneck out of the hardware and back into the human mind.” • to predict the realistic behavior of matter using diverse simulation methods • LB3D - highly scalable grid based code to model dynamics and hydrodynamics of complex multiphase fluids • mesoscale simulations enables access to larger physical and longer timescales • RealityGrid environment enables multiple steered and spawned simulations, the visualised output being streamed to a distributed set of collaborators located at AG nodes across the USA and UK.
Testbed and Project Partners Reality Grid partners: • University College London (Application, Visualisation, Networking) • University of Manchester (Application, Visualisation, Networking) • Edinburgh Parallel Computing Centre (Application) • Tufts University (Application) Teragrid sites at: • Argonne National Laboratory (Visualization, Networking) • National Center for Supercomputing Applications (Compute) • Pittsburgh Supercomputing Center (Compute, Visualisation) • San Diego Supercomputer Center (Compute) UK High-End Computing Services - HPCx run by the University of Edinburgh and CCLRC Daresbury Laboratory (Compute, Networking, Coordination) - CSAR run by the University of Manchester and CSC (Compute and Visualisation)
Computer Servers The TeraGyroid project has access to a substantial fraction of the world's largest supercomputing resources, including the whole of the UK's supercomputing facilities and the USA's TeraGrid machines. The largest simulations are in excess of one billion lattice sites. ~ 7 TB memory - 5K processors in integrated resource
Edinburgh Glasgow Newcastle Belfast Manchester DL Cambridge Oxford RAL Cardiff London Southampton Networking Netherlight Amsterdam TeraGrid BT provision UK
Applications Porting • LB3D written in Fortran90 • Order 128 variables per grid point 1Gpoint = 1TB • Various compiler issues to be overcome at different sites • Site configuration issues important eg I/O access to high speed global file systems for checkpoint files • Connectivity of high-speed file systems to network • Multi heading required of several systems to separate control network from data network • Port forwarding required for compute nodes on private network
Exploring parameter spacethrough computational steering Cubic micellar phase, high surfactant density gradient. Cubic micellar phase, low surfactant density gradient. Initial condition: Random water/ surfactant mixture. Self-assembly starts. Lamellar phase: surfactant bilayers between water layers. Rewind and restart from checkpoint.
Reality Grid - Environment Computations run at HPCx, CSAR, SDSC, PSC and NCSA Visualisation run at Manchester, UCL, Argonne, NCSA, Phoenix Scientists steering calculations from UCL and Boston over Access Grid Visualisation output and collaborations multicast to Phoenix and visualised on the show floor in the University of Manchester booth
Visualisation servers • Amphiphilic fluids produce exotic mesophases with a range of complex morphologies - need visualisation • The complexity of these data sets (128 variables) makes visualisation a challenge • Using the VTK library, with patches refreshing each time new data available • Video stream multicast to Access Grid using FLXmitter library • SGI OpenGL Vizserver used to allow remote control of visualisation • Visualisation of billion node models requires 64-bit hardware and multiple rendering units • Achieved visualisation of 10243 lattice using ray-tracing algorithm developed at University of Utah on 100 proc Altix on showroom floor at SC’03
Grid Software Infrastructure • Various versions of Globus Toolkit 2.2.3, 2.2.4, 2.4.3 and 3.1 (including GT 2 compatibility bundles) • Used GRAM, GridFTP Globus-I/O - no incompatibilities • Not use MDS- robustness/ utility of data • 64 bit version of GT2 required for AIX (HPCx) system - some grief due to tendency to require custom-patched versions of third party libraries • Lot of system management effort required to work with/ around toolkit • Need a more scalable CA system that bypasses every system administrator having to study everyone else’s certificates
Networking SimEng2 PSC SimEng1 UK VizEng2 PHOENIX Disk1 UK
Networking • On-line visualization requires O(1 Gbps) bandwidth for larger problem sizes • Steering requires 100% reliable near-real time data transport across the Grid to visualization engines. • Reliable transfer is achieved using TCP/IP: handshaking for each single packet that is transferred (to check and repair loss). This slows down transport limits data transfer rates limits LB3D steering of larger systems. • Point-to-n-point transport for visualization, storage and job migration uses n times more bandwidth since unicast is used.
What Was Done? The TeraGyroid experiment represents the first use of collaborative, steerable, spawned and migrated processes based on capability computing. • generated 2TB of data • exploration of the multi-dimensional fluid coupling parameter space with 643 simulations accelerated through steering • study of finite size periodic boundary condition effects, exploring the stability of the density of defects in the 643 simulations as they are scaled up to 1283, 2563, 5123, 10243 • 100K to 1,000K time steps • exploring the stability the crystalline phases to perturbations and variations in effective surfactant temperature • 1283 and 2563 simulations - clear of finite size effects • Perfect crystal not formed in 1283 systems - 600K steps • Statistics of number of defects, velocity and lifetimes requires large systems as these have sufficient defects
World’s Largest Lattice Boltzmann Simulation? • 10243 lattice sites • scale up 1283 simulations with periodic tiling and perturbations for initial state • Finite-size effect free dynamics • 2048 processors • 1.5 TB of memory • 1 minute per time step on 2048 processors • 3000 time steps • 1.2TB of visualisation data Run on LeMieux at Pittsburgh SC
Access Grid Screen at SC ‘03 during SC Global Session on Application Steering
Demonstrations/ Presentations Demonstrations of the TeraGyroid experiment at SC’03: TeraGyroid on the PSC Booth Tue 18, 10:00-11:00 Thu 20, 10:00-11:00 RealityGrid and TeraGyroid on UK e-Science Booth Tue 18, 16:00-16:30 Wed 19, 15:30-16:00 RealityGrid during the SC'03 poster session: Tue 18, 17:00-19:00 HPC-Challenge presentations: Wed 19 10:30-12:00 SC Global session on steering: Thu 20, 10:30-12:00 Demonstrations and real-time output at the University of Manchester and HPCx booths.
Project Objectives - How Well Did We Do? - 1 • world class computational science experiment • science analysis is ongoing - leading to new insights into properties of complex fluids at unprecedented scales • SC’03 award - ‘Most Innovative Data Intensive App’ • enhanced expertise/ experience to benefit UK and USA • first transatlantic federation of major HEC facilities • applications need to be adaptable to different architectures • inform construction/operation of national/ int grids • most insight gained into end to end network integration, performance and dual homed systems • remote visualisation, steering and checkpointing require high bandwidth which is dedicated and reservable • results fed directly into ESLEA proposal to exploit UKLight optical switched network infrastructure • stimulate long-term strategic technical collaboration • strengthened relationships between Globus, networking and visualisation groups
Project Objectives - How Well Did We Do? - 2 • support long-term scientific collaborations • built on strong and fruitful existing scientific collaborations between researchers in UK and USA • experiments with clear scientific deliverables - an explicit science plan was published, approved and then executed. Data analysis is ongoing. • choice of applications to be based on community codes • experiences will be of benefit to other grid based applications in particular in the computation engineering community • inform future programme of complementary experiments • Report to be made available on RG Website • EPSRC Initiating another Call for Proposals - not targetting SC’04.
Lessons Learned • How to support such projects - full peer review? • Timescales were very tight - September - November • Resource estimates need to be flexible • Need complementary experiments for US and UK to reciprocate benefits • HPC centres/ e-science and networking groups can work very effectively together on challenging common goals • Site configuration issues very important - network access • Visualisation capabilities in UK need upgrading • Scalable CA, dual address systems • Network QoS very important for checkpointing, remote steering and visualisation • Do it again?