1 / 19

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12. The HEPiX virtualisation working group. The HEPiX virtualisation working group was formed to facilitate the instantiation of user-generated virtual machine images at HEPiX (and WLCG) sites.

moke
Download Presentation

Virtualised Worker Nodes Where are we? What next? Tony Cass GDB 2012 12/12/12

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtualised Worker NodesWhere are we?What next?Tony CassGDB 201212/12/12

  2. The HEPiX virtualisation working group • The HEPiX virtualisation working group was formed to facilitate the instantiation of user-generated virtual machine images at HEPiX (and WLCG) sites. • Users were expressing such a wish in 2008/9, but sites were worried about issues such as uncontrolled root access and the maintenance of the traceability logs required by Grid security policies. This, at least, is still an issue.

  3. Image endorsement • The HEPiX VWG developed a policy that introduced the concept of image endorsers: people who would guarantee that generated images could be used safely at sites. • Amongst other things, such images would • have no embedded user credentials, and • enable sites to contextualise the images to enable the required logging and make other necessary customisations. • Sites agree, however, not to modify the software environment of the image. • Sites are free to trust (or not) specific image endorsers but, if they do trust someone in this role, it is expected that any images endorsed by this person can be used at that site without the need for inspection or manual approval. • The HEPiX VWG policy became the basis of an approved JSPG policy document, “Policy Trusted Virtual Machines”.

  4. Current Status • The endorsement policy is agreed. • Technical arrangements have been defined for • image contextualisation • these are compatible with EC2/OpenNebula/OpenStack • exchange of information between the site infrastructure and a running virtual machine • e.g. remaining lifetime, that the virtual machine can be terminated, … • A framework for image endorsers to publish and distribute images has been developed. • This has been integrated with StratusLab’s marketplace at LAL and is being integrated with OpenStack Glance at CERN. • CERNVM images are compatible with the HEPiX VWG policies • and there has been a security review of the underlying technology. HEPiX vwg model (and s/w) endorsed by theEGI federated cloud task force. Many thanks to Owen,Michel, Belmiro & Ulrich

  5. Job done then. What now?

  6. How this could be used User Site A Shared Image Repository (VMIC) Payload pull Central Task Queue Site B Instance requests Slide courtesy of Ulrich Schwickerath Site C VO service Image maintainer Cloud bursting Commercial cloud

  7. A Visionfor Virtualisationin WLCG Tony Cass WLCG GDB, 9/9/9

  8. Goals Enable experiments/users to choose environment for job execution. Ensure sites have control/traceability over resource usage. Virtualisation Vision- 8

  9. Approach • Step-by-step: Build on • established successes • established trust • But end goal in view. Prepare for this now with • technical agreements/developments • user behaviour (especially explicit statement of resource requirements) Virtualisation Vision- 9

  10. Approach • Five steps • Steps 1-3 • realistic • relatively uncontroversial(?) • achievable by end-2010? • Steps 4 & 5 • kite-flying • probably controversial • interesting Virtualisation Vision- 10

  11. Step 1 Not done. Sites may be using virtual machines but this is transparent to users.And I’m not sure we’re any nearer a negotiation on core needs. Let’s just forget this step now. • Users can choose between virtual images created at sites. • Not really any different from now; could be rephrased “sites provide virtual machines for job execution, not real hardware”. • Key issue is (full) understanding of resource requirements • OS type, memory, (range of) #cores, ... Virtualisation Vision- 11

  12. Step 2 Not done but could be. HEPiX  HEPiX  HEPiX  • Distribution of virtual machine images between sites (or from CERN...). • Image limited to minimalist operating system (SL4/5/6...) • Requires • transparent process for image generation guaranteeing content • mechanism for sites to hook into local monitoring and batch scheduling. • trusted and verifiable method of image distribution Virtualisation Vision- 12

  13. Step 3 CVMFS delivers this. • Distributed virtual image includes experiment software environment • So users can choose ATLAS version X on OS Y. • Requires “transparent process for image generation” to be extended to include experiment software. • Snapshot of experiment build servers at CERN? • Removes need for pilot jobs to verify (or create) correct environment. Virtualisation Vision- 13

  14. What about CernVM? This works… We took too long (not) testing static images! • Instantiation of CernVM machines being discussed between IT and PH teams; could be an option at CERN. • But scalability and verifiability of CernVM distribution for widespread use as remote batch image is far from evident. • Not excluded, but more likely after successful experience with static images. Virtualisation Vision- 14

  15. Step 4 from cvmfs Let’s work on this togethernow Let’s work on this now • Distributed virtual image includes client to connect directly to experiment pilot job framework (Dirac, PanDA). • Initially with virtual machine images instantiated according to jobs arriving at sites. • Later, sites instantiate virtual machines according to observed load and local policy • Lots of busy ATLAS machines? Start more... • Requires some way for pilot job frameworks to know (remaining) lifetime of virtual machine. • VM unlikely to be updated (security patches...), so lifetime will be limited. Virtualisation Vision- 15

  16. Step 4 issues • Moving credentials into VM images • What role for pilot factories? • Can we avoid queues of virtual machine instantiation requests at sites? • How to streamline (minimise…) communications between sites and experiments? • …

  17. Let the discussion begin!

  18. Step 5 SLURM today? • Experiment pilot job frameworks replaced by commercial/public domain schedulers. • Virtual LSF cluster for ATLAS • Virtual SGE cluster for CMS • ... • ... Virtualisation Vision- 18

More Related