250 likes | 263 Views
All Hands Meeting September 10 th -13 th 2007 Nottingham Experiences with different middleware solutions on the NW-GRID. Jens Thomas STFC Daresbury Laboratory j.m.h.thomas@dl.ac.uk. Overview. General background The NW-GRID Middleware solutions and our experiences with them Globus
E N D
All Hands MeetingSeptember 10th-13th 2007NottinghamExperiences with different middleware solutions on the NW-GRID Jens Thomas STFC Daresbury Laboratory j.m.h.thomas@dl.ac.uk
Overview • General background • The NW-GRID • Middleware solutions and our experiences with them • Globus • Nordugrid • Tools that build on the middleware • Application Hosting Environment (AHE) • GROWL scripts • Integrating the solutions into the CCP1GUI • Conclusions
Where it's at • There is an abundance of heterogeneous computing resources out there • Usage of the larger machines usually requires detailed knowledge of the system and often lots of red tape to gain access • User must manually log on to run jobs and is responsible for manually handling data transfer/management • As well as being inefficient, this requires the user be extremely computer literate
Where middleware fits in • The middleware creates an abstraction to the interface to each computer so there is a uniform way of accessing the different resources • Authentication is mediated by a Grid Certificate, so there is a single sign-on processes – no more worries about remembering passwords on dozens of machines • User can sit on their (generally Unix) desktop and access a range of resources via a single interface
NW-GRID Aims and Partners • Aims: • Establish, for the NorthWest region, a world-class activity in the deployment and exploitation of Grid middleware • realise the capabilities of the Grid in leading edge academic, industrial and business computing applications • Leverage 100 posts plus £15M of additional investment • Project Partners: • Daresbury Laboratory: CSED and e-Science Centre • Lancaster University: Management School, Physics, e-science and computer science • University of Liverpool: Physics and Computer Services • University of Manchester: Computing, Computer Science, Chemistry, bio-informatics + systems biology • Proudman Oceanographic Laboratory, Liverpool
Hardware – 2006 procurement • From Sun / Streamline computing • Dual core, dual processor AMD Opteron nodes (with at least 8 GB of memory / node) • 96 nodes – Daresbury • 48 nodes – Lancaster • 44 nodes – Liverpool • 25 nodes – Manchester • 8 TB Panasas file servers at Daresbury, Lancaster and Liverpool • 2.8 TB RAID array at Manchester • Separate data and communications GigE interconnect.
Globus • A powerful middleware solution for linking disparate computing resources together • Open source so freely available to academia and industry • The most popular solution at the moment and widely used in many countries • Provides a range of tools for managing and moving data, submitting jobs, discovering resources, security etc. • High performance and standards based • Under active development • Used on both the NGS and the NW-GRID http://www.globus.org
Globus in practice • Server and Client only install on *nix machines and the installation is extremely awkward - although this has improved • Different versions in use 2.4, 4.0 (3?) • Need a bunch of ports open on both client and server • No resource discovery in the command-line tools • User needs to manually stage data to and from the machine • Command line interface not very practical and need extensive knowledge of the resource (e.g. absolute paths,environment var) • Error reporting is pretty poor myhost> globus-job-submit dl1.nw-grid.ac.uk/jobmanager-sge -x (count="4")(directory="/panfs/dl/home/jmht/")(stdout="/panfs/dl/home/jmht/unnamed.out")(stdin="/panfs/dl/home/jmht/unnamed.in")(jobtype="mpi") /panfs/dl/home/jmht/gamess
The NorduGrid Collaboration From ... ... To • EDG >ARC • Testbed >50 sites • HEP +Bio,Chem.,.. • 4 Nordic >13 countries • 20 cpu’s >5000 cpu’s • 2001 >2003 NB: Slides from the Nordugrid website ... from a research project to a research collaboration ...from a Grid testbed to a major middleware provider NOT an infrastructure, does not operate or control resources
Features • Builds on top of Globus 2.4, but extends it's functionality to provide a powerful working solution (but only on *nix) • Job monitoring and management • Seamless input/output data movement • Complete up-to-date information on the available resources • Serial batch job submission to best resources available • Matchmaking, brokering • Basic data management • Indexing, movement • Easy to install both server and client
Nordugrid in practice • Firewall friendly-just join a VO that provides the resources you need • Lightweight client easily installed via rpms/dpkg • Integrated resource discovery • All application providers must provide an agreed environment, so you know how your job will run. • Data transfer integrated into the job - no manual staging needed • Relatively good error reporting & can request the whole run directory be returned • A command-line client so more typing at the shell • Extends RSL so a powerful, but complex job file: & (executable=hellogrid.sh) (stdout=hello.out) (stderr=hello.err) (gmlog=gridlog) (cputime=10) (memory=200) (disk=1)(runTimeEnvironment=“APPS/CHEM/GAMESS-UK-7.0-1.0”)
The problem • Globus and Nordugrid are powerful technologies that, used correctly, can greatly aid working scientists, but: • they are developed by computer scientists who are happy talking to computers and dealing with the problems they throw up (firewalls, dependencies, arcane error messages…) • Intended users are scientists who generally aren't interested in becoming computer scientists or bonding with their machines • work still required to make the tools easily usable for a “normal” working scientist with little interest in what happens under the bonnet
GROWL Scripts • Part of the larger GROWL project: www.growl.org.uk • A set of command-line scripts to wrap the globus tools and make them more user friendly (ports, job-string, paths etc) • Alleviates some of the problems with firewalls, but need gsi-ssh access to the resource • Automatically downloads and builds the required libraries on all VDT-supported platforms • A useful tool, but as it builds on Globus, currently only available on *nix
Application Hosting Environment • Thanks to Stefan Zasada for slides/pictures • “Community model”: expert user installs and configures an application to be shared via AHE server (which now installs as part of the OMII stack) • Application (e.g. NAMD, LB3D, LAMMPS, DL-POLY) is a web service so can submit from client on one machine and monitor from another (e.g. PDA) • Provides support for building quite complex workflows across a range of resources/codes. • hosts all knowledge about supported applications, which are not required to be modified in any way • supports Globus 2.4 (4.0?), SGE, Condor & Unicore • builds on WRSF::Lite - applications exposed as a web service so potentially available to other WSRF clients. • WebDav to stage files
AHE Client • written in Java so easy to install and even runs on Windows • Very firewall-friendly • maintains no information about the job and so is mobile. • doesn't need to know anything about the applications • isolated from changes to underlying grid • a GUI so no command-line typing required (although scripting tools are provided)
AHE Summary • A very useful addition to the grid toolkit • Will work with a variety of middleware • Users can submit jobs from any machine and then monitor them from a variety of different platforms • Focuses on the application, so more geared towards the scientist • Runs on most platforms and no firewall issues • Handles file staging • Scriptable, so can create workflows • Server is a serious pain to install • No resource discovery
Good, but could do better… • AHE, GROWL scripts & Nordugrid ARC focus on overcoming the deficiencies of the middleware - they try to make the process of running the job simpler • However, they still require the user to become intimately involved in running the job • Experience shows that even this is enough to put many potential users off taking advantage of the computing power that is out there • In many cases where they have been used successfully, the target scientists are closely linked to grid-savvy developers • We thought we would try to integrate remote job submission into our CCP1GUI to insulate the user from the grid/job handling as far as possible
The CCP1GUI • An extensible Graphical User Interface for computational chemistry packages • Aims to provide a uniform environment to enable users to run a variety of different codes on a range of different resources: • GAMESS-UK, Molpro, Dalton, ChemShell • Provides powerful visualisation capabilities for interpreting the outputs of calculations (builds on VTK) • A freely available code hosted on Sourceforge • http://sourceforge.net/projects/ccp1gui • Has the potential to run on all the major operating system platforms. • Use of Python and an object-oriented design enables rapid development and for users to script the code for themselves
Why was it developed? • Many of the codes used within CCP1 did not have a Graphical User Interface. • Long-standing need for a graphical interface to GAMESS-UK. • Needed something to help students and new users of the codes get up and running more quickly. • Requirement for a simplified environment for constructing and viewing molecules. • Need to be able to visualise the complex results of quantum mechanical calculations • Program should be free so no barriers to its widespread use. • Need a single tool that can be made to to run on a variety of hardware/operating system platforms.
Complex visualisations • Electric field visualisations: TNT and Water
The CCP1GUI Calculation interface Visualistion/builder window Job submit window Job Manager window
The Job Editor (on OSX) • The bare minimum a user need to know: • where to run • what to run • how many processors • For Nordugrid, don't even need to worry about where it's going • For some machines need to know paths and jobmanager, but the GUI will remember the details for each application/machine combination
Features • (Relatively) uniform interface to several different underlying job submission technologies integrated into a familiar environment • Handles all aspects of data movement • Tries to trap as many of the minefield of potential errors and respond with a helpful message ("I can't find your executable on that machine" vs "JOB DONE") • Remembers submitted jobs so can be shut down/restarted • Has been used successfully by acknowledged computer-phobes to run parallel jobs on the NW-GRID • Is largely a proof of concept, but has enabled some real science to get carried out by users who otherwise would be unlikely to access the resources
Summary • Middleware is powerful and evolving technology that presents many opportunities to do good work • Middleware is muddleware as far as most scientists are concerned • Several projects are engaged in un-muddling things using a variety of approaches, but these are all job-centric • We've demonstrated one way to try and abstract things a stage further from the underlying technologies • There's still much to be done but there's lots of potential to help scientists extend what computers can do for them
Acknowledgments • The NW-GRID (Cliff Addison, Tim Franks) • The NGS • The Nordugrid collaboration • Stefan Zasada, Peter Coveney and the AHE team at UCL • John Kewley, Rob Allan and the GROWL team at STFC Daresbury Laboratory • Rik Tyer and the eMinerals team at STFC Daresbury Laboratory • Abbie Trewin and the guniea-pigs at Liverpool University