1 / 27

VMs at a Tier-1 site

VMs at a Tier-1 site. EGEE’09, 21-09-2009 Sander Klous, Nikhef. Contents. Introduction Who are we? Motivation Why are we interested in VMs? What are we going to do with VMs? Status How do we approach this issue? Where do we stand? Challenges. Introduction. Collaboration between

nibal
Download Presentation

VMs at a Tier-1 site

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VMs at a Tier-1 site EGEE’09, 21-09-2009 Sander Klous, Nikhef

  2. Contents • Introduction • Who are we? • Motivation • Why are we interested in VMs? • What are we going to do with VMs? • Status • How do we approach this issue? • Where do we stand? • Challenges BIG Grid - Virtualization working group

  3. Introduction • Collaboration between • NCF: national computing facilities • Nikhef: national institute for subatomic physics • NBIC: national bioinformatics center • Participation from Philips, SARA, etc. Goal: “Enables access to grid infrastructures for scientific research in the Netherlands” BIG Grid - Virtualization working group

  4. Motivation: Why Virtual Machines? • Site perspective • Resource flexibility (e.g. SL4 / SL5) • Resource management • Scheduling / multi-core / sandboxing • User perspective • Isolation from environment • Identical environment on multiple sites • Identical environment on local machine BIG Grid - Virtualization working group

  5. Different VM classes • Class 1: Site generated Virtual Machines • No additional trust issues • Benefits for system administration • Class 2: Certified Virtual Machines • Inspection and certification to establish trust • Requirements for monitoring / integration • Class 3: User generated Virtual Machines • No trust relation • Requires appropriate security measures BIG Grid - Virtualization working group

  6. Typical use case Class 1 VM Resource management Torque/PBS Virtual Machine Manager Job queue VM queue Site infrastructure Box 3 “8 Virtual SL5 WNs” Box 1 “Normal WN” Box 2 “8 Virtual SL4 WNs” BIG Grid - Virtualization working group

  7. Typical use case Class 2 VM Analysis on Virtual Machines • Run minimal analysis on desktop/laptop • Access to grid services • Run full analysis on the grid • Identical environment • Identical access to grid services • No interest to become system administrator • Standard experiment software is sufficient BIG Grid - Virtualization working group

  8. Typical use case Class 3 VM Identification and classification of GPCRs • Requires very specific software set • Blast 2.2.16 • HMMER 2.3.2 • BioPython1.50 • Even non-x86 (binary) applications! • Specific software for this user • No common experiment software BIG Grid - Virtualization working group

  9. Project status • Working group: virtualization of worker nodes https://wiki.nbic.nl/index.php/BigGrid_virtualisatie • Kick-off meeting July 6th 2009 • System administrators, User support, management • Phase 1 (3 months) • Collect site and user requirements • Identify other ongoing efforts in Europe • First design • Phase 2 (3 months) • Design and implement proof of concept BIG Grid - Virtualization working group

  10. Active working group topics • Policies/Security issues for Class 2/3 VMs • Technology study • Managing Virtual Machines • Distributing VM images • Interfacing the VM infrastructure with ‘the grid’ • Identify missing functionality and alternatives • Accounting and fare share, image management, authentication/authorization, etc. BIG Grid - Virtualization working group

  11. The Amazon identity crisis • The three most confronting questions: • What is the difference between a job and a VM? • Why can I do it at Amazon, but not at the grid? • What is the added value of grids over clouds? “We don’t want to compete with Amazon!” BIG Grid - Virtualization working group

  12. Policy and security issues E-science services and functionality • Data integrity, confidentiality and privacy • Non-repudiation of user actions System administrator point of view • Trust user intentions, not their implementations • Incident response more costly than certification • Forensics is time consuming BIG Grid - Virtualization working group

  13. Security 101 = Attack surface Compromised user space is often already enough trouble BIG Grid - Virtualization working group

  14. Available policies Grid Security Policy, version 5.7a VO Portal Policy, version 1.0 (draft) Big Grid Security Policy, version 2009-025 Grid Acceptable Use Policy, version 3.1 Grid Site Operations Policy, version 1.4a LCG/EGEE Incident Handling and Response Guide,version 2.1 Grid Security Traceability and Logging Policy,version 2.0 VO-Box Security Recommendations and Questionnaire, version 0.6 (draft, not ratified) BIG Grid - Virtualization working group

  15. Relevant policy statements Network security is covered by site local security policies and practices A VO Box is part of the trusted network fabric. Privileged access is limited to resource administrators Software deployed in the grid must include sufficient and relevant site central logging. BIG Grid - Virtualization working group

  16. First compromise Certified package repository Base templates Certified packages Separate user disk User specific stuff Permanent storage At run time No privileged access Comparable to VO box Licenses? BIG Grid - Virtualization working group

  17. Second compromise • Make separate grid DMZ for Class 3 VMs • Comparable to “Guest networks” • Only outbound connectivity • Detection of compromised guests • Extended security monitoring • Packet inspection, netflows (SNORT, nfsen) • Honeypots, etc. • Simple policy: one warning, you’re out. • Needs approval (network policy) from OST (Operations Steering Team) BIG Grid - Virtualization working group

  18. Technology study BIG Grid - Virtualization working group

  19. Managing VMs Resource management Torque/PBS OpenNebula Haizea Job queue VM queue Site Box 1 “Normal WN” Box 3 “8 Class 2/3 VMs” Box 2 “8 Virtual WNs” BIG Grid - Virtualization working group

  20. Distributing VM images Repository (SAN) iSCSI/LVM Image Image Image Class 2/3 upload solution Image Image Box 1 “Normal WN” Box 2 “8 Virtual WNs” Box 3 “8 Class 2/3 VMs” BIG Grid - Virtualization working group

  21. Repository Cached copy-on-write Image Box 1 Cache COW Image VM COW Box 2 Cache COW VM VM Image COW VM BIG Grid - Virtualization working group

  22. Interfacing VMs with ‘the grid’ • Grid middleware • globus-job-run • globus-gatekeeper • globus-job-manager • contact-string • jm-pbs-long • jm-opennebula • qsub / opennebula Nimbus/OCCI Repository (SAN) Image Image Image Class 2/3 upload solution Image Image Resource management Torque/PBS OpenNebula Class 2 Class 3 discussion BIG Grid - Virtualization working group

  23. Coffee table discussion VM contact-string Parameter passing issue • User management mapping • Mapping to OpenNebula users • Authentication / Authorization • Access to different VM images • Grid middleware components involved: • Cream-CE, BLAHp, glexec • Execution Environment Service https://edms.cern.ch/document/1018216/1 • Authorization Service Design https://edms.cern.ch/document/944192/1 BIG Grid - Virtualization working group

  24. Monitoring/Performance testing BIG Grid - Virtualization working group

  25. Performance • Small cluster • 4 dual CPU quad core machines • Image server with 2 TB storage • Integration with experimental testbed • Existing Cream-CE / Torque • Testing • Network I/O, is NAT feasible? • File I/O, what is the COW overhead? • Realistic jobs BIG Grid - Virtualization working group

  26. Other challenges • Accounting, scheduling based on Fair Share • Scalability! • Rapidly changing landscape • New projects every week • New versions every month • So many alternatives • VMWare, SGE, Eucalyptus, Enomaly • iSCSI, NFS, GFS, Hadoop • Monitoring and security tools BIG Grid - Virtualization working group

  27. Conclusions • Maintainability: no home grown scripting • Each solution should be part of a product • Validation procedure with each upgrade • Deployment • Gradually move VM functionality in production • Introduce VM worker nodes • Virtual machine endpoint in grid middleware • Test with a few specific Class 2/3 VMs • Scaling and performance tuning BIG Grid - Virtualization working group

More Related