1 / 13

Establishing an inter-organisational OGSA Grid: Lessons Learned

Establishing an inter-organisational OGSA Grid: Lessons Learned. Wolfgang Emmerich London Software Systems, Dept. of Computer Science University College London Gower St, London WC1E 6BT, U.K http://www.sse.ucl.ac.uk/UK-OGSA. An Experimental UK OGSA Testbed. Established 12/03-12/04

Download Presentation

Establishing an inter-organisational OGSA Grid: Lessons Learned

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Establishing an inter-organisational OGSA Grid: Lessons Learned Wolfgang Emmerich London Software Systems, Dept. of Computer Science University College London Gower St, London WC1E 6BT, U.K http://www.sse.ucl.ac.uk/UK-OGSA

  2. An Experimental UK OGSA Testbed • Established 12/03-12/04 • Four nodes: • UCL (coordinator) • NeSC • NEReSC • LeSC • Deployed Globus Toolkit 3.2 throughout onto • Heterogeneous HW/OS • Linux • Solaris • Windows XP

  3. Experience with GT3.2 Installation • Different levels of experience within team • Heterogeneity • HW (Intel/SPARC) • Operating system (Windows/Solaris/Linux) • Servlet container (Tomcat/GT3 container) • Interaction with previous GT versions • Departure from web service standards prevented standard tool use • JMeter • Development environments (Eclipse) • Exception management tools (Amberpoint) • Interaction with system administration • Platform dependencies

  4. Performance and Scalability • Developed GTMark • Server-side load model: SciMark 2.0 (http://math.nist.gov/SciMark) • Client-side load model, configuration and metrics collection based on J2EE benchmark StockOnline • Configurable Benchmark • Static vs dynamic discovery of nodes • Loads for fixed period of time or until steady state obtained • Constant or variation of concurrent requests

  5. Performance Results

  6. Scalability Results

  7. Performance Results • Performance and scalability of GT3.2 with Tomcat/Axis surprisingly good • Performance overhead of security is negligible • Good scalability - reached 96% of theoretical maximum • Tomcat performs better than GT3.2 container on slow machines • Surprising results on raw CPU performance

  8. Reliability • Tomcat more reliable than GT3.2 container. • Tomcat container sustained 100% reliability under load • GT3.2 container failed once every 300 invocations (99.67% reliability) • Denial of Service Attack possible by • Concurrently invoking operation on the same service instance (they are not thread safe!) • Fully exhausting resources • Problem of hosting more than one service in one container • Trade-off between reliability and reuse of containers across multiple users/services.

  9. Security • Interesting effect of firewalls on testing and debugging • Accountability and audit trails demand users be given individual accounts on each node • Overhead of node and user certificates (they always expire at the wrong time) • Current security model does not scale: • Assuming cost of £18/Admin hour • 10 users per node (site) • It will cost approx. £300,000 to set up a 100 node grid with 1000 users • It will be prohibitively expensive to scale up to 1,000 nodes(with admin costs in excess of £6M)

  10. Deployment • How do admins get grid middleware deployed systematically onto grid nodes? • How can users get the services onto remote hosts? • We tried out SmartFrog (http://www.smartfrog.org) • Worked very well inside a node. • Impossible across organisations: • SmartFrog daemon would need to execute actions with root privileges which some site admins just did not agree to • Security paramount (SmartFrog would be the perfect virus distribution engine) • SmartFrog’s security infastructure incompatible with GT 3.2 infrastructure

  11. Looking Ahead • Installation efforts need to be reduced significantly • Binary distributions • For a few selected HW/OS platforms • Standards compliance • Track standards by all means • Otherwise no economies of scale • Management console • Add / remove grid hosts • Need to be able to monitor status of grid resources • Across organisational boundaries • More lightweight security model needed • Role-based Access Control • Trust-delegation • Deployment is a first-class citizen • Avoid adding as an afterthought • Needs to be built into middleware stack

  12. Conclusions • Very interesting experience • Building a distributed system across organisational boundaries is different from building a system over a LAN • Insights that might prove useful for • OMII • Globus • ETF • There is a lot more work to do before we realize the vision of the Grid!

  13. Acknowledgements • A large number of people have helped with this project, including • Dave Berry (NeSC) • Paul Brebner (UCL, now CSIRO) • Tom Jones (UCL, now Symantec) • Oliver Malham (NeSC) • David McBride (LeSC) • Savas Parastatidis (NEReSC) • Steven Newhouse (OMII) • Jake Wu (NEReSC) • For further details (including IGR) check out http://sse.cs.ucl.ac.uk/UK-OGSA

More Related