1 / 48

Dr. David Wallom Technical Manager, OeRC Technical Director, NGS

The UK National Grid Service and the University of Oxford campus grid: examples of production distributed grid e-infrastructures. Dr. David Wallom Technical Manager, OeRC Technical Director, NGS. Outline. The National Grid Service Resources Services Users and Projects

inge
Download Presentation

Dr. David Wallom Technical Manager, OeRC Technical Director, NGS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The UK National Grid Service and the University of Oxford campus grid: examples of production distributed grid e-infrastructures Dr. David Wallom Technical Manager, OeRC Technical Director, NGS

  2. Outline • The National Grid Service • Resources • Services • Users and Projects • OxGrid, the University of Oxford campus grid • Resources • Software • Users

  3. NGS Mission and Goal To enable coherent electronic access for UK researchers to all computational and data based resources and facilities required to carry out their research, independent of resource or researcher location. Goal: • To enable a production quality e-infrastructure • Expand towards all Higher Education Institutes and UK based research institutions • Continue to support cutting edge research • To deliver core services and support • Research computing groups within universities and research organisations to help them support users • Highlight Collaborative research in key communities • Integrate with international infrastructures following user community demand

  4. UK e - Infrastructure get common access, tools, information, Users Nationally supported services, through NGS HPCx + HECtoR HEI/FEI Regional and Campus grids Community Grids Integrated internationally LHC VRE, VLE, IE ISIS TS2

  5. How do we do this? • Moving towards a standards based infrastructure • Independant of any external single middleware • Community specific interfaces deployed where required • Demonstration of benefits of using the same interface locally, regionally, nationally and internationally • Policies and best practice framework for the operation, inter-operation and sharing of research computing services across the UK. • Repository of relevant user and system documentation. • Quality review of member services • National help desk and support centre, • Operation of the underlying core services • Targeted training and outreach

  6. What does the NGS offer its user communities? • Compute services • Access to more and different resources • Access through a number of different community driven interfaces • Different ways of running jobs e.g. multi site MPI • Data services • Access to data storage • New ways of accessing data • Support and advice • Access Services • User facing services providing methods to access available resources • Support for community portals and gateways • Central Support services • Individual services needed to support user access, control and management

  7. Current status of NGS • Over 900 registered users, not including EGEE/wLCG users • 25 member institutions • 11 partner resources • 18 affiliate resources • Currently in discussion with 8 further institutions to add both compute and data resources

  8. The NGS Member Institutions, Summer 2009

  9. Benefits to the researcher • Access by local researchers to local resources through standard interfaces • Application level integration possible, the command line is no longer suitable as the only option, connect to resources using MATLAB, R etc. • Support development of compliant access methods either available locally or through a central service provider • Access to resources of a type that may not be locally available • Software unlicensed in your university? • Your data is generated by an international facility and you need it on your local resources? • Usage of resources independent of location insulating them from local downtime and periods of resource instability

  10. Benefits to the researcher • Creation of community practitioners • Pioneers within a subject are able to lead initially collaborators and then finally the field towards enlightenment • Push capability and advance methods of research • Computation is just the first part of the problem • Data management becoming key • Using defined standard data interfaces will allow easy access to collaborators and enable casual reuse

  11. Benefits to the institution • Access to expertise • Systems Administration • User training materials • Access to top up resources • Out of band resources, beyond normal requirements need beyond normal resources

  12. Benefits to the institution • Links into national e-infrastructure framework • Provision of value added services that a single institution wouldn’t want to run! • How large is the nationally collaborative research body? • How many times do you want researchers asking you to create custom agreements with different collaborators? • Allows for creation of inter-institutional economy • smooth usage of resources allowing increased utilisation of resource, increasing efficiency • external accounting & quality assurance frameworks • standard interfaces allowing users to see no change

  13. Value added services • Common standard interfaces • Distributed virtualised data storage and access • Monitoring and sysadmin support • Resource brokering • Application Hosting • Traditional • Web service • Application Portal • International gateway • Internationally recognised electronic identity • Training, outreach and support

  14. Organisational Membership • Personnel • Appointment of an institutional Campus Champion • Liaison between HEI/research organisation and NGS • Resource Exchanging • Regularly tested installation of NGS defined interfaces as described in the NGS Sits Level Services Document • Affiliate • Maintains control over permitted users • Partner • Supporting access by a significant body of NGS users • Publish a service level description (SLD) detailing the services offered

  15. Membership Process • Contact by Technical Director with Research Computing centre or known academic • Liaison Theme • Organise roadshow event • Identify Campus Champion • Identify resource for exchange • Operations Theme • Nomination of support buddy to work with site • Partnership team work with site for website and if necessary SLD

  16. Membership Process • Buddy works with site to install software as per a compliant profile • Use manually initiated INCA tests to determine installation success • Add to automated compliance testing framework • After 7 days successful conformance testing has been completed • Site is then recommended to the board

  17. NGS Site Level Services • Description of the different solutions to service requirements challenges • A set of different modules describing a particular function, each with a number of solutions which are grouped together as profiles • Supported interfaces are community driven, we will not/cannot dictate what we make available. • For example we are currently supporting the following computational interfaces: VDT gLite Globus 4 GridSAM

  18. NGS Recommended Profile • User Authentication and Authorisation: x509 (VDT) • Information System: GLUE (VDT) • Compute resource Service: Pre-WS GRAM (VDT) • Data Transfer Tools: GridFTP (VDT) • Storage Management, Database Hosting & User Data access Services: Clients (SRB, SRM, OGSA-DAI) • User Interface Service: GSISSH (VDT)

  19. EGEE/GridPP Node • User Authentication and Authorisation: x509 (gLite) • Information System: GLUE (gLite) • Compute resource Services: Pre-WS GRAM (gLite) • Data Transfer Tools: GridFTP (gLite) • Storage Management, Database Hosting & User Data access Services: Client & Server (SRM)

  20. Data services • Database (Oracle & MySQL) • Supports both user communities and NGS core services • STFC and University of Manchester • Storage Resource Broker • Share, replicate and access data with colleagues more efficiently • Distributed metadata capable storage system • Filesystem access to disk storage • SCP, GSI-SCP, AFS (on the way) • Service based methods to access structured data using OMII-UK OGSA-DAI • All integrated with the NGS user authentication and authorisation mechanisms

  21. Specialist services Westminster • Operates and supports P-GRADE portal and GEMLCA legacy application support services • Belfast e-Science Centre • Web Service Hosting Container Service • Web service containers into which projects or VOs can deploy their own Web or Grid services, using automatic deployment technology • Oxford e-Research Centre • OMII-UK GridSAM • OGF HPC-Basic Profile compliant Job submission • Promoting interoperability with international grids • Eucalyptus Cloud system • Exposing AWS compatible interfaces and functionality to NGS users • Edinburgh • Eucalyptus Cloud system • STFC Rutherford Appleton Laboratory • Visualisation using specialised cluster from within the STFC e-Science Viz group

  22. Process for a New Service • User/Research community propose new service • Research and Development Team • Deploy test service, working with community • Develop appropriate service documentation • Final test service deployed on further NGS sites • Service ready for production deployment • Operations Team • Deploy operational production service • Liaison Team • Publicize the availability of the new service • Training materials developed with the community and training events held

  23. HERMES data client • Drag and drop file movement between local filesystem, SCP, GridFTP, and SRB • Java based desktop application • Authentication and authorisation using UK e-Science certificates

  24. Simulation performed on the NGS of a drug permeating through a membrane Name: Dr Brian CheneyInstitution: University of SouthamptonResearch: Membrane Permeation Drs Brian Cheney and Jonathan Essex research membrane permeation of small molecules at the University of Southampton. They’re interested in learning what physical and chemical features make a molecule a good or bad permeant, and in developing ways to quantify and estimate a molecule’s permeability. We are enabled by submission onto parallel resources not available locally

  25. Integrative Biology Project ThushkaMaharaj is part of an international collaboration studying the effects of applying an electrical shock to both healthy and diseased hearts, in an attempt to understand exactly how defibrillation works. “We use parallel code with around a million nodes.” explains Thushka. “But we can get 20ms of animation in 20 minutes using 32 CPUs on the NGS. And the benefits of services such as the Storage Resource Broker are immense - it’s fantastic to be able to share data with colleagues all over the world so easily.” Name:ThushkaMaharaj Institution: University of Oxford Research: The effects of defibrillation on the heart

  26. Using the NGS to access geographically distributed astronomy databases Name: Helen XiangInstitution: University of PortsmouthResearch: Astronomy databases Helen Xiang and Professor Robert Nicol at the University of Portsmouth have been working on Astronomy databases. They use software called OGSA-DAIto link astronomy data that is stored on the NGS and at Portsmouth. This way they can retrieve the data from two places with one command.

  27. GENIUS Yield patient-specific information which helps plan embolisation of arterio-venous malformations, aneurysms, etc. from inside the operating theatre. Use MPICH-G2 for concurrent submission into NGS, TeraGrid, DEISA and LONI resources from the AHE tool Name: Peter Coveney Institution:UCL Research:Model large scale patient specific cerebral blood flow in clinically relevant time frames

  28. Creating a Campus Grid

  29. Background • Oxford University has a complex structure, which makes rolling out a Campus Grid tricky… • There are 38 colleges which are legally-independent to the University • The colleges provide the bulk of PCs accessible to students. • Each have their own independent IT support staff • There are at least 63 academic departments which are semi-autonomous • Each have their own independent IT support staff • They tend to provide teaching laboratories • OeRC is just one of them

  30. OxGrid, a University Campus Grid • Single entry point for users to shared and dedicated resources • Seamless access to NGS and OSC for registered users

  31. OxGrid Oxford e-Research Centre Department/College Department/College Department/College Storage Server: SRB SSO Server User Management Server Resource Broker/ Login (Condor) Departmental Clusters Information Server Condor pool AFS Other University/Institution Other University/Institution Other University/Institution Condor pool OSC Resources National Grid Service Resource OxGrid Cluster Microsoft Cluster NGS Cluster

  32. Authorisation And Authentication • For users requiring external system access, required to use standard UK e-Science Digital Certificates • For users that only access university resources, a Kerberos CA system connected to the central University authentication system • Closely observing developments using Shibboleth and SAML

  33. OxGrid Resource Broker OxGrid Resource Broker Job Submit Scripts Adverts include dynamic information e.g. FreeCPU condor_schedd RB Advertise Script condor_collector condor_ negotiator Glue condor_ gridmanager Globus OxGrid BDII NGS BDII Machines Glue Machines Glue Machines Condor Machines National Grid Service Resource Oxford Condor Pool OeRC Resources Machines National Grid Service Resource Oxford Condor Pool OeRC Resources Machines National Grid Service Resource Other Oxford Resources Oxford Condor Pool Headnodes Machines Machines Machines Machines Machines Machines

  34. Some Machine ClassAd Attributes Name = "condor.oerc.ox.ac.uk“ SupportContact = "mailto: oxgrid-admin@maillist.ox.ac.uk“ resource_name = "condor.oerc.ox.ac.uk/jobmanager-condor" JobManager = "condor" Location = "Oxford , UK" LastGLUEQueryDelay = "323.0ms" Latitude = "51.759792" Longitude = "-1.259823" DistanceToOxGridRB = "0.0km" OpSys = "LINUX" Arch = "INTEL“ Memory = 2048 Application = “GCC3“ Usage = "0.6666666666666666“ TotalCPUs = 3 FreeCPUs = 1

  35. User Interface • job-submission-script • Test accessibility of resources • Matchmaking based on free CPUs • One job • Part of supported wrappers • submit-job • Flexible parameter/file/filename sweeps • Various waiting primitives for implementing workflows • Submit to different resource types (Campus Grid, NGS...) • New – hoping users will develop own scripting • getcert • Obtain a local, low assurance certificate • Uses University SSO architecture and MyProxy

  36. Other submissions mechanisms • Nimrod-G • Developed by Monash University • Graphical interaction using C++ - CGI based interface • Primarily for parameter sweep • All of the current NGS supported mechanisms for entry

  37. Core Resources • Available to all users of the campus grid • Individual Departmental Clusters (PBS, SGE) • Grid software interfaces installed • Management of users through pool accounts or manual account creation. • Clusters of PCs • Running Condor/SGE • Single master running up to ~500 nodes • Masters run either by owners or OeRC • Execution environment on 2nd OS(Linux), Windows or Virtual Machine

  38. External Resources • Only accessible to users that have registered with them • National Grid Service • Peered access with individual systems including core, partner and affiliates that support the NGS VO • OSC • Gatekeeper system • User management done through standard account issuing procedures and manual DN mapping • Controlled grid submission to Oxford Supercomputing Centre • Some departmental resources • Used as method to bring new resources initially online • Show the benefits of joining the grid • Limited accessibility to donated by other departments to maintain incentive to become full participants

  39. Joining OxGrid • Departments/Colleges have under-used desktop machines • Set up a Condor pool to make use of this capacity • Can still integrate with Wake-on-LAN work see OUCS/OeRC/OUCE LowCarbon ICT project • Process • OeRC provides a departmental Pool headnode • This means firewall exceptions can be kept to machine-machine • ITSS in department install one of our MSIs • One-click installation and configuration of CoLinux • Can be used with all Windows remote management systems

  40. Windows Client • Simple MSI • Silent install • Self configuring • Finds pool master • Runs with ”Low” priority • Network shares host’s IP • Two setups: • Background service • + Windows Condor • Monitors keyboard & mouse • Can run Windows tasks • CoLinux based client • Runs Linux kernel as a Windows driver • Appears as service in Windows • 98% speed of native • 3 Gb Debian 4.0 image • Data: read-only access to AFS applications and scratch space • Software: Matlab Compiler Runtime

  41. Data Management • Engagement of data as well as computational users • Provide a remote store for those groups that cannot resource their own • Distribute the client software as widely as possible to departments that are not currently engaged in e-Research • Supporting clients for command line, Java, WS and web based access

  42. User Code ‘Porting’ • User OeRC relationship formed to bring code onto the grid • Currently operates either on single node or cluster. • Design a wrapper, to control child process submission, data management and archiving • Hand code back to user as an example of a real computational task they want to do but a possible basis for further code porting by themselves

  43. OxGrid Users 2009 • Malaria Map • Phylogenetic Analysis of DNA • Analysis of Clinical ECG data • Biochemistry • Astronomy • Chemistry • Satellite image analysis • Heart modelling http://www.maps.ox.ac.uk/

  44. The Future • Design and construct user training courses • Package central server modules for public distribution • Already running on systems in Porto and Barcelona universities as well as Monash University • Continue contacting users to expand the user base • Link with other campus grid instances using NGS interfaces • Fully integrate institutional capabilities to provide a coherent research computing resource

  45. Conclusions • NGS is there to support communities through their hosting institutions • We must be flexible to support a number of different communities that have stretched across institutional boundaries and want different interfaces to research IT resources • NGS is able to provide knowledge, best practice and methodologies to connect communities and resource providers We are supporting successfully a number of different software profiles for different communities

  46. Conclusions • Users are using the single point of entry to schedule tasks onto NGS, OU and other campus grid systems • Working to engage more users and systems • Providing coherent electronic access to university research computing facilities • Working with further departments to increase size of compute contribution and buy in from them

  47. Contact david.wallom@oerc.ox.ac.uk support@grid-support.ac.uk

More Related