1 / 36

Accessing European Computing Platforms

Accessing European Computing Platforms. Stefan Zasada University College London. Contents. Introduction to grid computing What resources are available to VPH? How to gain access to computational resources Deploying applications on EU resources

keefe
Download Presentation

Accessing European Computing Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accessing European Computing Platforms Stefan Zasada University College London

  2. Contents • Introduction to grid computing • What resources are available to VPH? • How to gain access to computational resources • Deploying applications on EU resources • Making things easier with VPH ToolKit components • Coupling multiscale simulations • AHE demonstration • Going further

  3. What is a Grid? • Allow resource sharing between multiple, distinct organisations • Cross administrative domains • Security is important – only authorized users are able to use resources • Differs from cluster computing • “Computing as a utility” – Clouds too?

  4. Resources Available to VPH Researchers • DEISA • Heterogeneous HPC infrastructure currently formed by eleven European national supercomputing centres that are interconnected by a dedicated high performance network. • EGI/EGEE • Consists of 41,000 CPU available to users 24 hours a day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs. • EGI includes national grid initiatives • Clouds – Amazon EC3 etc

  5. Compute Allocations for VPH • Conducted a survey of all VPH-I projects and Seed EPs to assess their computational requirements. • This was used to prepare an application to the DEISA Virtual Communities programme for an allocation of resources to support the work of all VPH-I projects, managed by the NoE. • We have set up these VPH VC allocations and run them for the past 2 years – now extended until the end of DEISA in 2011

  6. EGEE/EGI • EGEE has transformed into EGI and will start to integrate national grid infrastructures • Consists of 41,000 CPU available to users 24 hours a day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs. • EGEE/EGI have agreed to provide access to VPH researchers through their BioMed VO. Ask for more details if you would like to use this. • If you would like access contact • vph-allocations@ercim.org

  7. DEISA Project Allocations • + euHeart in second wave, and other non-VPH EU projects • VPH was awarded 2 million standard DEISA core hours for 2009, • renewed for 2010 and 2011   • HECToR (Cray, UK) • SARA (IBM Power 6, Netherlands)

  8. Choosing the right resources for your application Parallel/MPI/HPC applications: Use DEISA (some DEISA sites have policies to promote large-scale computing. Embarrassingly parallel/task farming Use EGI/National Grid Initiative Split multiscale application between appropriate resources

  9. Uniform access to resources at all scales • Middleware provides underlying computational infrastructure infrastructure • Allows the different resources to be used seamlessly (grid, cloud etc). • Automates machine to machine communication with minimal user input • Virtualises resources – user need know nothing of the underlying operating systems etc • Requires X.509 certificate!

  10. Accessing DEISA/EGEE • DEISA • Provides access through Unicore/SSH (some sites have Globus) • Applications compiled at terminal, then submitted to batch queue or via Unicore • EGI/EGEE • Accessed via gLite, often submit job to compile • National Grid Initiative • Should run gLite if part of EGI • May run Globus/Unicore

  11. The Application Hosting Environment • Based on the idea of applications as web services • Lightweight hosting environment for running unmodified applications on grid resources (DEISA, TeraGrid) and on local resources (departmental clusters) • Community model: expert user installs and configures an application and uses the AHE to share it with others • Simple clients with very limited dependencies • Non-invasive – no need for extra software to be installed on resources • http://toolkit.vph-noe.eu/home/tools/compute-resources/tools-for-accessing-compute-resources/ahe.html

  12. Motivation for the AHE Problems with current middleware solutions: • Difficult for an end user to configure and/or install • Dependent on lots of supporting software also being installed • Require modified versions of common libraries • Require non-standard ports to be opened on firewall We have access to many international grids running different middleware stacks, and need to be able to seamlessly interoperate between them

  13. AHE Functionality • Launch simulations on multiple grid resources • Single interface to monitor and manipulate all simulations launched on the various grid resource • Run simulations without manually having to stage files and GSISSH in • Retrieve files to local machine when simulation is done • Can use a combination of different clients – PDA, desktop GUI, command line

  14. AHE Design Constraints • Client does not have require any other middleware installed locally • Client is NAT'd and firewalled • Client does not have to be a single machine • Client needs to be able to upload and download files but doesn’t have local installation of GridFTP • Client doesn’t maintain information on how to run the application • Client doesn’t care about changes to the backend resources

  15. AHE 2.0 Provides a Single Platform to: • Launch applications on Unicore and Globus 4 grids by acting as an OGSA-BES and GT4 client • Create advanced reservations using HARC and launch applications in to the reservation • Steer applications using the RealityGrid steering API or GENIUS project steering system • Launch cross-site MPIg applications AHE 3 plans to support: • SPRUCE urgent job submission • Lightweight certificate sharing mechanisms

  16. Virtualizing Applications • Application Instance/Simulation is central entity; represented by a stateful WS-Resource. State properties include: • simulation owner • target grid resource • job ID • simulation input files and urls • simulation output files and urls • job status

  17. HPCx Leeds Manchester Oxford RAL AHE Reach UK NGS NGS Globus Local resources GridSAM DEISA TeraGrid Globus UNICORE

  18. Cross site access Scientific Application (e.g. bloodflow simulation HemeLB) Application Hosting Environment (AHE) as scientific-specific Client technology Globus GRAM 4 Interface OGSA-BES Interface TeraGrid Globus 4 stack Grid Middleware UNICORE with BES Security Policies with gridmaps GLUE-based information Security Policies (e.g. XACML) GLUE-based information Common Environment Common Environment DEISA Modules with HemeLB massively parallel jobs with HemeLB massively parallel jobs with HemeLB GridFTP Interface HPC Ressource within NGS Storage Resources (e.g. tape archives, robots) HPC Ressource within DEISA Stefan Zasada, Steven Manos, Morris Riedel, Johannes Reetz, Michael Rambadt et al., Preparation for the Virtual Physiological Human (VPH) project that requires interoperability of numerous Grids 18 18

  19. AHE Deployment AHE Server • Released as a VirtualBox VM image – download image file and import to VirtualBox • All required services, containers etc pre-configured • User just needs to configure services for their target resources/applications AHE Client • User’s machine must have Java installed • User downloads and untars client package • Imports X.509 certificate into Java keystore using provided script • Configures client with endpoints of AHE services supplied by expert user

  20. Hosting a New Application Expert user must: • Install and configure application on all resources on which it is being shared • Create an XSLT template for the application (easily cloned from exiting template) • Add the application to the RMInfo.xml file • Run a script to reread the configuration Documentation covers whole process of deploying AHE & applications on NGS and TeraGrid

  21. Authentication via local credentials Computational Grid Audited Credential Delegation certificate Credential Repository PI, sysadmin, … Our “gateway” service using our modified AHE client and a local authentication service Modified AHE Server User Grid Middleware • A designated individual puts a single certificate into a credential repository controlled by our new “gateway” service. • User uses a local authentication service to authenticate to our gateway service. • Our gateway service provides a session key (not shown) to our modified AHE client and our modified AHE Server to enable the AHE client to authenticate to the AHE Server. • Our gateway service obtains a proxy certificate from its credential repository as necessary and gives it to our modified AHE Server to interact with the grid. • User now has no certificate interaction. • Private key of the certificate is never exposed to the user.

  22. ACD Architecture Gateway Service Credential Repository Trusted CA Revoked CA ProjectName Certificate ResourceName Certificate ResourceName ProjectName Certificate Key Proxy Key Proxy UserID Local Database Authentication Server Authorization Module Paramterized Role Based Access Control DB_LAS u1,pw1 Role [Tasks] u2,pw2 UserID [Role] u3,pw3 UserRole: UserID [Role] u4,pw4 RolePermission: Role [Task] Recent AHE + ACD usability study found combined solution to be much more usable than other interfaces u1,pw1

  23. Deploying multiscale applications • Loosely coupled: • Codes launched sequentially – output of one app is input to next • Could be distributed across multiple resources • Perl scripts • GSEngine • Tightly coupled: • Codes launched simulations • Could be distributed across multiple resources • RealityGrid Steering API • MPIg

  24. Workflow Tools • Many tools exist to allow users to connect up applications running on a grid in to workflow: • Taverna, OMII-BPEL, Triana, GSEngine, Scripts • Most workflow tools interface to and orchestrate web services • Workflows can also include data access, data processing and analysis services • Can assist with connecting different VPH models together

  25. Constructing workflows with the AHE • By calling command line clients from Perl script complex workflows can be achieved • Easily create chained or ensemble simulations • For example jobs can be chained together: • ahe-prepare  prepare a new simulation for the first step • ahe-start  start the step • ahe-monitor  poll until step complete • ahe-getoutput  download output files • repeat for next step

  26. Constructing a workflow ahe-prepare ahe-start ahe-monitor ahe-getoutput

  27. Computational Techniques • Ensemble MD is suited for HPC GRID • Simulate each system many times from same starting position • Each run has randomized atomic energies fitting a certain temperature • Allows conformational sampling End Conformations Start Conformation Series of Runs Launch simultaneous runs (60 sims, each 1.5 ns) C1 C2 Cx C3 Equilibration Protocols C4 eq1 eq2 eq3 eq4 eq5 eq6 eq7 eq8

  28. GSEngine • GSEngine is a workflow orchestration engine developed by the ViroLab project • Can be used to orchestrate applications launched by AHE • It allows services to be orchestrated using both point and click and scripting interfaces • Workflows stored in a repository and shared between users • Many of the aims of ViroLab similar to VPH-I projects, so GSEngine will be useful here • Included in VPH ToolKit: http://toolkit.vph-noe.eu/home/tools/compute-resources/workflow/gsengine.html

  29. Computational Steering Control Simulate Visualize Render • Closing the loop between simulation and user • Monitoring a running simulation: • Pause, resume or stop if required • Altering parameters in a running simulation • Receive feedback via on-line visualization • Restarting a simulation from a known checkpoint • Reproducibility, provenance

  30. Steering WS Steering library Steering library Steering WS AHE/ReG Steering Integration AHE launch start Simulation register launch bind Steering library Extended AHE Client connect data transfer (sockets/files) Registry bind Visualization Visualization Slide adapted from Andrew Porter

  31. Cross-site Runs with MPI-g • MPI-g has been designed to run allow single applications run across multiple machines • Some problems won’t fit on a single machine, and require the RAM/processors of multiple machines on the grid. • MPI-g allows for jobs to be turned around faster by using small numbers of processors on several machines - essential for clinician

  32. MPI-g Requires Co-Allocation • We can reserve multiple resources for specified time periods • Co-allocation is useful for meta-computing jobs using MPIg, viz and for workflow applications. • We use HARC - Highly Available Robust Co-scheduler (developed by Jon Maclaren at LSU). Slide courtesy Jon Maclaren

  33. HARC • HARC provides a secure co-allocation service • Multiple Acceptors are used • Works well provided a majority of Acceptors stay alive • Paxos Commit keeps everything in sync • Gives the (distributed) service high availability • Deployment of 7 acceptors --> Mean Time To Failure ~ years • Transport-level security using X.509 certificates • HARC is a good platform on which to build portals/other services • XML over HTTPS - simplerthan SOAP services • Easy to interoperate with • Very easy to use with the Java Client API

  34. Multiscale Applications on European e-Infrastructures (MAPPER) VPH Fusion Engineering Computional Biology MaterialScience MAPPER Distributed Multiscale Computing Needs €2.5M, starting 1st October for 3 years Partners: UvA, UCL, UU, PSC, Cyfronet, LMU, UNIGE, Chalmers, MPG

  35. MAPPER Infrastructure

  36. Going further • 1. Download and install AHE Server: • Download: http://www.realitygrid.org/AHE • Quick start guide: http://wiki.realitygrid.org/wiki/AHE • 2. Host a new application (e.g. /bin/date) • Generic app tutorial: http://wiki.realitygrid.org/wiki/AHE • 3. Modify these example workflow scripts to run your application: • http://www.realitygrid.org/AHE/training/ahe-workflow.tgz

More Related