410 likes | 423 Views
RealityGrid. Software Infrastructure: Achievements and Prospects. Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines <stephen.pickles@man.ac.uk> http://www.realitygrid.org Royal Society, Tuesday 15 June, 2004. Outline. Review How we got here Status Where we are today Prospects
E N D
RealityGrid Software Infrastructure: Achievements and Prospects Stephen Pickles, Andrew Porter, Robin Pinning & Rob Haines <stephen.pickles@man.ac.uk> http://www.realitygrid.org Royal Society, Tuesday 15 June, 2004
Outline • Review • How we got here • Status • Where we are today • Prospects • Where we’re going RealityGrid Annual Workshop, 15/6/2004
Review How we got here
The pieces Fast track • Computational Steering Library and tools (MC) • On-line Visualization (MC) • Web portal (EPCC) • Human-Computer Interfaces (HCI) Deep track • Performance Control (CNC) • Resource management, component frameworks (IC) • Instruments: LUSI, XMT (not this talk) This talk will emphasise fast track work. RealityGrid Annual Workshop, 15/6/2004
Design philosophies • Grid-enabled • Component-based and service-oriented • plug in and compose new components and services, from partners and third parties • Independence and modularity • to minimize dependencies on third-party software • Should be able to steer locally without and Grid middleware • to facilitate parallel development within project • Integration and/or interoperability • Things should work together • Respect autonomy of application owners • Prefer light-weight instrumentation of application codes to wholesale re-factoring • Same source (or binary) should work with or without steering • Dynamism and adaptability • Attach/detach steering client from running application • Adapt to prevailing conditions • Intuitive and appropriate user interfaces RealityGrid Annual Workshop, 15/6/2004
Historical Context –Messages from above • In 2002, we were told “use Globus, SRB or Condor”. • Then we were told “Web services are OK too”. • Then the Open Grid Services Architecture (OGSA) effort was announced. • OGSA would be based on the Open Grid Services Infrastructure (OGSI), and specifications began in earnest with (it seemed) overwhelming industrial support. • “You must be on an OGSA-convergence track. You must use e-Science certificates.” • GT3 appears 2003. Some people build GT3 services. No-one builds production grids based on GT3. • Early in 2004, we hear “OGSI was a great success. OGSI is dead. Long live WS-RF. GT3 is obsolescent.” RealityGrid Annual Workshop, 15/6/2004
2002 - Enter Grid Services OGSI brought the hope of convergence between Web services (technology of choices for business process integration) and Grid computing. It offered state, 2-level naming (GSH, GSR), lifetime management, and infrastructure support for common patterns (factories, registries, notification)… With Dave Snelling, we experimented with UNICORE-based OGSI prototype (pre-dating GT3 preview). RealityGrid Annual Workshop, 15/6/2004
First “Fast Track” Demonstration Jens Harting at UK e-Science All Hands Meeting, September 2002 RealityGrid Annual Workshop, 15/6/2004
“Fast Track” Steering DemoUK e-Science AHM 2002 Bezier SGI Onyx @ Manchester Vtk + VizServer Firewall SGI OpenGL VizServer UNICORE Gateway and NJS Manchester Laptop SHU Conference Centre Simulation Data UNICORE Gateway and NJS QMUL • VizServer client • Steering GUI • The Mind Electric GLUE web service hosting environment with OGSA extensions • Single sign-on using UK e-Science digital certificates Dirac SGI Onyx @ QMUL LB3D with RealityGrid Steering API Steering (XML) RealityGrid Annual Workshop, 15/6/2004
Steering library Steering library Steering library Steering architecture in 2002 Communication modes: • Shared file system • Files moved by UNICORE daemon • GLOBUS-IO Simulation Client data transfer Visualization Visualization RealityGrid Annual Workshop, 15/6/2004
Dilemma • Wanted to separate steering from job management • Architecture was brittle and firewall unfriendly • Client needed to know too much about application deployment • Direct connection between client and simulation is problematic when client is mobile • OGSI’s lifetime management, registries, language neutrality and notification seemed ideal for steering • (ended up not using OGSI notification for firewall reasons) • But all “production” grids were based on Globus Toolkit version 2 (GT2) RealityGrid Annual Workshop, 15/6/2004
Serendipity – OGSI::Lite Mark Mc Keown’s OGSI::Lite started life as a spare time exercise to understand Web services, then OGSI. Soon became a near-complete OGSI implementation. Minimal pre-requisites (Perl and SOAP::Lite) meant we could deploy it trivially in user space when the job is run. Only need permission to listen on a port. (This would be highly non-trivial using deep stack of GT3.) So we could have our OGSI cake and eat it on a GT2 grid. Our steering architecture quickly got a middle-tier implemented in OGSI::Lite. RealityGrid Annual Workshop, 15/6/2004
components start independently and attach/detach dynamically Simulation Steering GS bind Steering library Steering library Steering library Steering library publish Client connect data transfer (Globus-IO) Steering client Registry find publish bind Display Display Display Visualization Visualization Steering GS The Architecture of Steering OGSI middle tier multiple clients: Qt/C++, .NET on PocketPC, GridSphere Portlet (Java) remote visualization through SGI VizServer, Chromium, and/or streamed to Access Grid RealityGrid Annual Workshop, 15/6/2004
The TeraGyroid Project • Funding from EPSRC (UK) & NSF (USA) • Ran LB3D across UK e-Science Grid and US TeraGrid • Study of defect dynamics in liquid crystalline surfactant systems using lattice-Boltzmann methods • Featured world’s largest Lattice Boltzmann simulation • TRICEPS was the HPC-Challenge aspect of this work • Transcontinental RealityGrids for Interactive Collaborative Exploration of Parameter Space • “most innovative data-intensive application” at SC’03 • Later picked up ISC 2004 award in the “Integrated Data and Information Management” category • More in Richard Blake’s talk RealityGrid Annual Workshop, 15/6/2004
New for TeraGyroid • Access Grid integration • use of Chromium to complement VizServer • job migration based on malleable checkpoints • user friendly “wizard” to drive job launching and migration • support for parameter space exploration through checkpoint trees • also implemented in OGSI::Lite • services thrown together for TeraGyroid have been upgraded in flight • still running 8 months later • file transfer service • to get around issues with systems homed on two networks • port forwarding (Stephen Booth, EPCC) • to work around lack of public IP address on compute nodes (e.g. HPCx) RealityGrid Annual Workshop, 15/6/2004
Checkpoint trees andparameter space exploration Cubic micellar phase, high surfactant density gradient. Cubic micellar phase, low surfactant density gradient. Initial condition: Random water/ surfactant mixture. Self-assembly starts. Lamellar phase: surfactant bilayers between water layers. Rewind and restart from checkpoint. RealityGrid Annual Workshop, 15/6/2004
Access Grid integration - SC Global RealityGrid Annual Workshop, 15/6/2004
TeraGyroid Testbed Starlight (Chicago) Netherlight (Amsterdam) 10 Gbps ANL PSC Manchester Caltech NCSA Daresbury BT provision 2 x 1 Gbps production network MB-NG SJ4 SDSC Phoenix Visualization UCL Access Grid node Computation Service Registry Network PoP Dual-homed system RealityGrid Annual Workshop, 15/6/2004
EPSRC e-Science Meeting 2004 • Multiple steering clients driving same simulation • Qt client on laptop • .NET client on PDA • Simon Nee (Loughborough) • Web client • GridSphere Portlet • Access through web browser • Matthew Egbert (EPCC) • not all at same time • significant achievement in terms of OGSI interoperability • Collaborative steering prototype • using ICENI and client proxy • Java bindings to client side of steering library (JNI) • Gary Kong (LeSC) RealityGrid Annual Workshop, 15/6/2004
Public Release – April 2004 Steering Library released as version 1.1 • version 1.0 was project internal • very liberal open source license (FreeBSD) • API specification version 1.1 • Library (C and Fortran90 bindings) • Tools, including Qt steerer • User Manual • Examples Available for download at:http://www.sve.man.ac.uk/Research/AtoZ/RealityGrid/ Globus-IO replaced by vanilla sockets • major simplification to build process • only way to complete integration of NAMD and VMD into RealityGrid RealityGrid Annual Workshop, 15/6/2004
Status Where we are today
Steering library • We instrument (add "knobs" and "dials" to) simulation codes through a steering library, written in C • Bindings in Fortran90, C/C++ (complete) and Java (partial) • Library features: • Pause/resume • Checkpoint and restart • Set values of steerable parameters (parameter steer) • Report values of monitored (read-only) parameters (parameter watch) • Emit "samples" to remote systems for e.g. on-line visualization • Consume "samples" from remote systems for e.g. resetting boundary conditions • Automatic emit/consume with steerable frequency • No restrictions on parallelisation paradigm • You only implement what you need RealityGrid Annual Workshop, 15/6/2004
Qt Steering client Built using C++ and Qt Attaches to any steerable RealityGrid application Discovers what commands are supported Discovers steerable & monitored parameters Constructs appropriate widgets on the fly RealityGrid Annual Workshop, 15/6/2004
On-line visualisation • Fast track uses open source VTK for on-line visualisation • Simple GUI built with Tk/Tcl, polls for new data to refresh image • Some in-built parallelism • extended to use the steering library • AVS-format data supported • XDR-format data for sample transfer between platforms • Volume render (parallel) • Isosurface • Hedgehog • Cut-plane • New work on atom-centric meshes for Steve Kenny RealityGrid Annual Workshop, 15/6/2004
OGSI is dead. Long live WS-RF! • WS-ResourceFramework preserves most OGSI ideas in a way which is friendlier (less abusive) to Web services. • Open Middleware Infrastructure Institute (OMII) has a conservative roadmap based on Web services. • WS-I plus as little else as possible • UK National Grid Service is aligned with EGEE. • This means Globus Toolkit version 2 for at least 12 months. • WS-RF (and WS-Notification) are moving targets. • What does this mean for us? RealityGrid Annual Workshop, 15/6/2004
Our response to WS-RF • We must be able to exploit the grids that exist • GT4 is unlikely to be stable and widely deployed in lifetime of RealityGrid • OGSI::Lite works fine for us, so continue to use it for now. • In time, WS-RF may be appropriate. • seems indicated for the Steering Grid Service, which is a very dynamic thing • optional for persistent services such as Checkpoint Metadata Tree and Registry. These could be implemented in plain Web services. • WSRF::Lite is already an option • prototype released within a few weeks of first publication of WS-RF drafts • featured in WS-RF interop fest in April, and interop demo at GGF 11 last week RealityGrid Annual Workshop, 15/6/2004
Standards, generally • Very slow progress on Advance Reservation • RealityGrid requires co-allocation of compute, viz, AG resources at time to suit the humans • LSF, PBS(Pro), SGE now support it, but not accessible through middleware • GRAAP-WG at GGF is bogged down in WS-Agreement and has yet to address protocols and apply them to Advance Reservation problem • Practical WS-RF interoperability will require coherent, global security strategy for Web services, and a delegation model • not clear that GT4 interoperability is the driver. • GT3 and GT4 security has never been on the standards table • what is GSI-SecureConversation anyway? • OGSA itself is a massive undertaking and will not settle in RealityGrid’s lifetime • RealityGrid is a provider of use case drivers for GRAAP, GridCPR, OGSA, SAGA (and other) groups in GGF RealityGrid Annual Workshop, 15/6/2004
Prospects Where we’re going
Steering Plans • Tabbed steerer (work in progress) • single client tabs between multiple steerable simulations • required for thermodynamic integration work using NAMD • Steering of multi-component simulations (coupled models) • requires metadata about component interactions and schedule • Quantitative study of the overhead of steering and on-line visualization • Support use of steering within project • Final release of steering library, toolkit and documentation Significant Gap - Security!!! • contingent on additional funding for WSRF::Lite • and coherent global security strategy for Web services RealityGrid Annual Workshop, 15/6/2004
Steering - Wishlist • Port of steering services to WS-RF • probably in a follow-on project • Provenance of steering and parameter space exploration • Collaborative steering • i.e. support simultaneous connection of multiple clients • Scripted steering • Breakpoints ( IF (temperature > TOO_HOT) THEN … ) • Replay of previous steering actions • Integration of steering into selected MVEs • entirely feasible, but can’t do them all RealityGrid Annual Workshop, 15/6/2004
Standardisation of Steering Opportunities: • Standardise an API for computational steering • Standardise the WSDL of the Steering Grid Service These could be input to the GGF research group “Simple APIs for Grid Applications” (SAGA-RG) Is there critical mass? RealityGrid Annual Workshop, 15/6/2004
Visualization Plans • Finish atom-centric meshes • High-performance visualization • re-evaluate AVS with Parallel Support Toolkit • “Thin visualization” • delivered to PDA or Web browser • thumbnails in checkpoint tree Possibilities • Use of *-ray from Utah • AVS module for streaming to Access Grid • VizServer integration: • Put GSI authentication into VizServer PAM when released • Liaison with Platform and SGI regarding use of VizServer API for Advance Reservation of graphics pipes RealityGrid Annual Workshop, 15/6/2004
Launching and packaging Plans • Continue to improve usability • Reduce deployment overhead • wizard can now work with Java CoG kit • easier to deploy than Globus client bundles Possibilities • Integrate RLS or SRB into checkpoint tree • Pick up Web service approaches to job submission RealityGrid Annual Workshop, 15/6/2004
HCI Plans • Update of HCI Audit report in light of experiences • Journal paper on the HCI of TeraGyroid • .NET client • deployable demonstrator with renderings on PDA and Windows laptop Identified activities, off critical path, for PhD student • VizServer QoS experiments with MB-NG or UK-Light • Thin visualization for PDAs and Web portals RealityGrid Annual Workshop, 15/6/2004
Portal • Currently provides Web client for steering • GridSphere portlet communicates with Steering Grid Service via SOAP • Prototype portlet for checkpoint tree browsing Little resource (2-3 PM) remains for second phase of portal work. Plans • Finish checkpoint tree browsing • Incorporate use of registry for simulation discovery • Hope to inherit JSR168 portlets for job launching and monitoring • limited visualization capability • slice of scalar field • subject to resources RealityGrid Annual Workshop, 15/6/2004
Resource Management – Deep Track • Advance Reservation • proof of concept using SGE 6.0 • Implemented within Job Submission Web Service separated from ICENI • using Job Definition Markup Language (JDML) • which is evolving into Job Submission Definition Language (JSDL) through Global Grid Forum JSDL working group • designed to support plug-in of other job submission systems • eg. Globus, gsi-ssh, UNICORE, LSF,... RealityGrid Annual Workshop, 15/6/2004
Steering GS Steering library Application Control Status Data in / Data out ICENI integration – Deep Track • Technical report on feasibility of integrating fast-track steerable binary (with associated SGS) as an ICENI component • If practical, do it. RealityGrid Annual Workshop, 15/6/2004
Performance Control – Deep Track • Performance Control of coupled models • working with HybridMD code and Bespoke Framework Generator (BFG) • outcomes: technology demonstrator & research papers • deployment in production is unlikely • Performance prediction of same • Steering of BFG-coupled models • Integration of PERCO and ICENI is not likely • Generalised malleable-checkpoint library is unlikely • major undertaking, re-inventing SRS from UTK • application specific alternatives always possible for those that need it • Proven to be possible to support steering or PERCO through a common API • which simplifies instrumentation of application codes • but doing both at the same time leads to frighteningly complex interactions RealityGrid Annual Workshop, 15/6/2004
Conclusions • We will not solve everything during the lifetime of RealityGrid • We must be ruthless about what we do and do not undertake RealityGrid Annual Workshop, 15/6/2004
Partners Industrial • Schlumberger • Edward Jenner Institute for Vaccine Research • Silicon Graphics Inc • Computation for Science Consortium • Advanced Visual Systems • Fujitsu • BT Exact Academic • University College London • Queen Mary, University of London • Imperial College • University of Manchester • University of Edinburgh • University of Oxford • University of Loughborough RealityGrid Annual Workshop, 15/6/2004
SVE @ Manchester Computing Bringing Science and Supercomputers Together http://www.sve.man.ac.uk