480 likes | 613 Views
The UK National Grid Service and the University of Oxford campus grid: examples of production distributed grid e-infrastructures. Dr. David Wallom Technical Manager, OeRC Technical Director, NGS. Outline. The National Grid Service Resources Services Users and Projects
E N D
The UK National Grid Service and the University of Oxford campus grid: examples of production distributed grid e-infrastructures Dr. David Wallom Technical Manager, OeRC Technical Director, NGS
Outline • The National Grid Service • Resources • Services • Users and Projects • OxGrid, the University of Oxford campus grid • Resources • Software • Users
NGS Mission and Goal To enable coherent electronic access for UK researchers to all computational and data based resources and facilities required to carry out their research, independent of resource or researcher location. Goal: • To enable a production quality e-infrastructure • Expand towards all Higher Education Institutes and UK based research institutions • Continue to support cutting edge research • To deliver core services and support • Research computing groups within universities and research organisations to help them support users • Highlight Collaborative research in key communities • Integrate with international infrastructures following user community demand
UK e - Infrastructure get common access, tools, information, Users Nationally supported services, through NGS HPCx + HECtoR HEI/FEI Regional and Campus grids Community Grids Integrated internationally LHC VRE, VLE, IE ISIS TS2
How do we do this? • Moving towards a standards based infrastructure • Independant of any external single middleware • Community specific interfaces deployed where required • Demonstration of benefits of using the same interface locally, regionally, nationally and internationally • Policies and best practice framework for the operation, inter-operation and sharing of research computing services across the UK. • Repository of relevant user and system documentation. • Quality review of member services • National help desk and support centre, • Operation of the underlying core services • Targeted training and outreach
What does the NGS offer its user communities? • Compute services • Access to more and different resources • Access through a number of different community driven interfaces • Different ways of running jobs e.g. multi site MPI • Data services • Access to data storage • New ways of accessing data • Support and advice • Access Services • User facing services providing methods to access available resources • Support for community portals and gateways • Central Support services • Individual services needed to support user access, control and management
Current status of NGS • Over 900 registered users, not including EGEE/wLCG users • 25 member institutions • 11 partner resources • 18 affiliate resources • Currently in discussion with 8 further institutions to add both compute and data resources
Benefits to the researcher • Access by local researchers to local resources through standard interfaces • Application level integration possible, the command line is no longer suitable as the only option, connect to resources using MATLAB, R etc. • Support development of compliant access methods either available locally or through a central service provider • Access to resources of a type that may not be locally available • Software unlicensed in your university? • Your data is generated by an international facility and you need it on your local resources? • Usage of resources independent of location insulating them from local downtime and periods of resource instability
Benefits to the researcher • Creation of community practitioners • Pioneers within a subject are able to lead initially collaborators and then finally the field towards enlightenment • Push capability and advance methods of research • Computation is just the first part of the problem • Data management becoming key • Using defined standard data interfaces will allow easy access to collaborators and enable casual reuse
Benefits to the institution • Access to expertise • Systems Administration • User training materials • Access to top up resources • Out of band resources, beyond normal requirements need beyond normal resources
Benefits to the institution • Links into national e-infrastructure framework • Provision of value added services that a single institution wouldn’t want to run! • How large is the nationally collaborative research body? • How many times do you want researchers asking you to create custom agreements with different collaborators? • Allows for creation of inter-institutional economy • smooth usage of resources allowing increased utilisation of resource, increasing efficiency • external accounting & quality assurance frameworks • standard interfaces allowing users to see no change
Value added services • Common standard interfaces • Distributed virtualised data storage and access • Monitoring and sysadmin support • Resource brokering • Application Hosting • Traditional • Web service • Application Portal • International gateway • Internationally recognised electronic identity • Training, outreach and support
Organisational Membership • Personnel • Appointment of an institutional Campus Champion • Liaison between HEI/research organisation and NGS • Resource Exchanging • Regularly tested installation of NGS defined interfaces as described in the NGS Sits Level Services Document • Affiliate • Maintains control over permitted users • Partner • Supporting access by a significant body of NGS users • Publish a service level description (SLD) detailing the services offered
Membership Process • Contact by Technical Director with Research Computing centre or known academic • Liaison Theme • Organise roadshow event • Identify Campus Champion • Identify resource for exchange • Operations Theme • Nomination of support buddy to work with site • Partnership team work with site for website and if necessary SLD
Membership Process • Buddy works with site to install software as per a compliant profile • Use manually initiated INCA tests to determine installation success • Add to automated compliance testing framework • After 7 days successful conformance testing has been completed • Site is then recommended to the board
NGS Site Level Services • Description of the different solutions to service requirements challenges • A set of different modules describing a particular function, each with a number of solutions which are grouped together as profiles • Supported interfaces are community driven, we will not/cannot dictate what we make available. • For example we are currently supporting the following computational interfaces: VDT gLite Globus 4 GridSAM
NGS Recommended Profile • User Authentication and Authorisation: x509 (VDT) • Information System: GLUE (VDT) • Compute resource Service: Pre-WS GRAM (VDT) • Data Transfer Tools: GridFTP (VDT) • Storage Management, Database Hosting & User Data access Services: Clients (SRB, SRM, OGSA-DAI) • User Interface Service: GSISSH (VDT)
EGEE/GridPP Node • User Authentication and Authorisation: x509 (gLite) • Information System: GLUE (gLite) • Compute resource Services: Pre-WS GRAM (gLite) • Data Transfer Tools: GridFTP (gLite) • Storage Management, Database Hosting & User Data access Services: Client & Server (SRM)
Data services • Database (Oracle & MySQL) • Supports both user communities and NGS core services • STFC and University of Manchester • Storage Resource Broker • Share, replicate and access data with colleagues more efficiently • Distributed metadata capable storage system • Filesystem access to disk storage • SCP, GSI-SCP, AFS (on the way) • Service based methods to access structured data using OMII-UK OGSA-DAI • All integrated with the NGS user authentication and authorisation mechanisms
Specialist services Westminster • Operates and supports P-GRADE portal and GEMLCA legacy application support services • Belfast e-Science Centre • Web Service Hosting Container Service • Web service containers into which projects or VOs can deploy their own Web or Grid services, using automatic deployment technology • Oxford e-Research Centre • OMII-UK GridSAM • OGF HPC-Basic Profile compliant Job submission • Promoting interoperability with international grids • Eucalyptus Cloud system • Exposing AWS compatible interfaces and functionality to NGS users • Edinburgh • Eucalyptus Cloud system • STFC Rutherford Appleton Laboratory • Visualisation using specialised cluster from within the STFC e-Science Viz group
Process for a New Service • User/Research community propose new service • Research and Development Team • Deploy test service, working with community • Develop appropriate service documentation • Final test service deployed on further NGS sites • Service ready for production deployment • Operations Team • Deploy operational production service • Liaison Team • Publicize the availability of the new service • Training materials developed with the community and training events held
HERMES data client • Drag and drop file movement between local filesystem, SCP, GridFTP, and SRB • Java based desktop application • Authentication and authorisation using UK e-Science certificates
Simulation performed on the NGS of a drug permeating through a membrane Name: Dr Brian CheneyInstitution: University of SouthamptonResearch: Membrane Permeation Drs Brian Cheney and Jonathan Essex research membrane permeation of small molecules at the University of Southampton. They’re interested in learning what physical and chemical features make a molecule a good or bad permeant, and in developing ways to quantify and estimate a molecule’s permeability. We are enabled by submission onto parallel resources not available locally
Integrative Biology Project ThushkaMaharaj is part of an international collaboration studying the effects of applying an electrical shock to both healthy and diseased hearts, in an attempt to understand exactly how defibrillation works. “We use parallel code with around a million nodes.” explains Thushka. “But we can get 20ms of animation in 20 minutes using 32 CPUs on the NGS. And the benefits of services such as the Storage Resource Broker are immense - it’s fantastic to be able to share data with colleagues all over the world so easily.” Name:ThushkaMaharaj Institution: University of Oxford Research: The effects of defibrillation on the heart
Using the NGS to access geographically distributed astronomy databases Name: Helen XiangInstitution: University of PortsmouthResearch: Astronomy databases Helen Xiang and Professor Robert Nicol at the University of Portsmouth have been working on Astronomy databases. They use software called OGSA-DAIto link astronomy data that is stored on the NGS and at Portsmouth. This way they can retrieve the data from two places with one command.
GENIUS Yield patient-specific information which helps plan embolisation of arterio-venous malformations, aneurysms, etc. from inside the operating theatre. Use MPICH-G2 for concurrent submission into NGS, TeraGrid, DEISA and LONI resources from the AHE tool Name: Peter Coveney Institution:UCL Research:Model large scale patient specific cerebral blood flow in clinically relevant time frames
Background • Oxford University has a complex structure, which makes rolling out a Campus Grid tricky… • There are 38 colleges which are legally-independent to the University • The colleges provide the bulk of PCs accessible to students. • Each have their own independent IT support staff • There are at least 63 academic departments which are semi-autonomous • Each have their own independent IT support staff • They tend to provide teaching laboratories • OeRC is just one of them
OxGrid, a University Campus Grid • Single entry point for users to shared and dedicated resources • Seamless access to NGS and OSC for registered users
OxGrid Oxford e-Research Centre Department/College Department/College Department/College Storage Server: SRB SSO Server User Management Server Resource Broker/ Login (Condor) Departmental Clusters Information Server Condor pool AFS Other University/Institution Other University/Institution Other University/Institution Condor pool OSC Resources National Grid Service Resource OxGrid Cluster Microsoft Cluster NGS Cluster
Authorisation And Authentication • For users requiring external system access, required to use standard UK e-Science Digital Certificates • For users that only access university resources, a Kerberos CA system connected to the central University authentication system • Closely observing developments using Shibboleth and SAML
OxGrid Resource Broker OxGrid Resource Broker Job Submit Scripts Adverts include dynamic information e.g. FreeCPU condor_schedd RB Advertise Script condor_collector condor_ negotiator Glue condor_ gridmanager Globus OxGrid BDII NGS BDII Machines Glue Machines Glue Machines Condor Machines National Grid Service Resource Oxford Condor Pool OeRC Resources Machines National Grid Service Resource Oxford Condor Pool OeRC Resources Machines National Grid Service Resource Other Oxford Resources Oxford Condor Pool Headnodes Machines Machines Machines Machines Machines Machines
Some Machine ClassAd Attributes Name = "condor.oerc.ox.ac.uk“ SupportContact = "mailto: oxgrid-admin@maillist.ox.ac.uk“ resource_name = "condor.oerc.ox.ac.uk/jobmanager-condor" JobManager = "condor" Location = "Oxford , UK" LastGLUEQueryDelay = "323.0ms" Latitude = "51.759792" Longitude = "-1.259823" DistanceToOxGridRB = "0.0km" OpSys = "LINUX" Arch = "INTEL“ Memory = 2048 Application = “GCC3“ Usage = "0.6666666666666666“ TotalCPUs = 3 FreeCPUs = 1
User Interface • job-submission-script • Test accessibility of resources • Matchmaking based on free CPUs • One job • Part of supported wrappers • submit-job • Flexible parameter/file/filename sweeps • Various waiting primitives for implementing workflows • Submit to different resource types (Campus Grid, NGS...) • New – hoping users will develop own scripting • getcert • Obtain a local, low assurance certificate • Uses University SSO architecture and MyProxy
Other submissions mechanisms • Nimrod-G • Developed by Monash University • Graphical interaction using C++ - CGI based interface • Primarily for parameter sweep • All of the current NGS supported mechanisms for entry
Core Resources • Available to all users of the campus grid • Individual Departmental Clusters (PBS, SGE) • Grid software interfaces installed • Management of users through pool accounts or manual account creation. • Clusters of PCs • Running Condor/SGE • Single master running up to ~500 nodes • Masters run either by owners or OeRC • Execution environment on 2nd OS(Linux), Windows or Virtual Machine
External Resources • Only accessible to users that have registered with them • National Grid Service • Peered access with individual systems including core, partner and affiliates that support the NGS VO • OSC • Gatekeeper system • User management done through standard account issuing procedures and manual DN mapping • Controlled grid submission to Oxford Supercomputing Centre • Some departmental resources • Used as method to bring new resources initially online • Show the benefits of joining the grid • Limited accessibility to donated by other departments to maintain incentive to become full participants
Joining OxGrid • Departments/Colleges have under-used desktop machines • Set up a Condor pool to make use of this capacity • Can still integrate with Wake-on-LAN work see OUCS/OeRC/OUCE LowCarbon ICT project • Process • OeRC provides a departmental Pool headnode • This means firewall exceptions can be kept to machine-machine • ITSS in department install one of our MSIs • One-click installation and configuration of CoLinux • Can be used with all Windows remote management systems
Windows Client • Simple MSI • Silent install • Self configuring • Finds pool master • Runs with ”Low” priority • Network shares host’s IP • Two setups: • Background service • + Windows Condor • Monitors keyboard & mouse • Can run Windows tasks • CoLinux based client • Runs Linux kernel as a Windows driver • Appears as service in Windows • 98% speed of native • 3 Gb Debian 4.0 image • Data: read-only access to AFS applications and scratch space • Software: Matlab Compiler Runtime
Data Management • Engagement of data as well as computational users • Provide a remote store for those groups that cannot resource their own • Distribute the client software as widely as possible to departments that are not currently engaged in e-Research • Supporting clients for command line, Java, WS and web based access
User Code ‘Porting’ • User OeRC relationship formed to bring code onto the grid • Currently operates either on single node or cluster. • Design a wrapper, to control child process submission, data management and archiving • Hand code back to user as an example of a real computational task they want to do but a possible basis for further code porting by themselves
OxGrid Users 2009 • Malaria Map • Phylogenetic Analysis of DNA • Analysis of Clinical ECG data • Biochemistry • Astronomy • Chemistry • Satellite image analysis • Heart modelling http://www.maps.ox.ac.uk/
The Future • Design and construct user training courses • Package central server modules for public distribution • Already running on systems in Porto and Barcelona universities as well as Monash University • Continue contacting users to expand the user base • Link with other campus grid instances using NGS interfaces • Fully integrate institutional capabilities to provide a coherent research computing resource
Conclusions • NGS is there to support communities through their hosting institutions • We must be flexible to support a number of different communities that have stretched across institutional boundaries and want different interfaces to research IT resources • NGS is able to provide knowledge, best practice and methodologies to connect communities and resource providers We are supporting successfully a number of different software profiles for different communities
Conclusions • Users are using the single point of entry to schedule tasks onto NGS, OU and other campus grid systems • Working to engage more users and systems • Providing coherent electronic access to university research computing facilities • Working with further departments to increase size of compute contribution and buy in from them
Contact david.wallom@oerc.ox.ac.uk support@grid-support.ac.uk