270 likes | 374 Views
VMs at a Tier-1 site. EGEE’09, 21-09-2009 Sander Klous, Nikhef. Contents. Introduction Who are we? Motivation Why are we interested in VMs? What are we going to do with VMs? Status How do we approach this issue? Where do we stand? Challenges. Introduction. Collaboration between
E N D
VMs at a Tier-1 site EGEE’09, 21-09-2009 Sander Klous, Nikhef
Contents • Introduction • Who are we? • Motivation • Why are we interested in VMs? • What are we going to do with VMs? • Status • How do we approach this issue? • Where do we stand? • Challenges BIG Grid - Virtualization working group
Introduction • Collaboration between • NCF: national computing facilities • Nikhef: national institute for subatomic physics • NBIC: national bioinformatics center • Participation from Philips, SARA, etc. Goal: “Enables access to grid infrastructures for scientific research in the Netherlands” BIG Grid - Virtualization working group
Motivation: Why Virtual Machines? • Site perspective • Resource flexibility (e.g. SL4 / SL5) • Resource management • Scheduling / multi-core / sandboxing • User perspective • Isolation from environment • Identical environment on multiple sites • Identical environment on local machine BIG Grid - Virtualization working group
Different VM classes • Class 1: Site generated Virtual Machines • No additional trust issues • Benefits for system administration • Class 2: Certified Virtual Machines • Inspection and certification to establish trust • Requirements for monitoring / integration • Class 3: User generated Virtual Machines • No trust relation • Requires appropriate security measures BIG Grid - Virtualization working group
Typical use case Class 1 VM Resource management Torque/PBS Virtual Machine Manager Job queue VM queue Site infrastructure Box 3 “8 Virtual SL5 WNs” Box 1 “Normal WN” Box 2 “8 Virtual SL4 WNs” BIG Grid - Virtualization working group
Typical use case Class 2 VM Analysis on Virtual Machines • Run minimal analysis on desktop/laptop • Access to grid services • Run full analysis on the grid • Identical environment • Identical access to grid services • No interest to become system administrator • Standard experiment software is sufficient BIG Grid - Virtualization working group
Typical use case Class 3 VM Identification and classification of GPCRs • Requires very specific software set • Blast 2.2.16 • HMMER 2.3.2 • BioPython1.50 • Even non-x86 (binary) applications! • Specific software for this user • No common experiment software BIG Grid - Virtualization working group
Project status • Working group: virtualization of worker nodes https://wiki.nbic.nl/index.php/BigGrid_virtualisatie • Kick-off meeting July 6th 2009 • System administrators, User support, management • Phase 1 (3 months) • Collect site and user requirements • Identify other ongoing efforts in Europe • First design • Phase 2 (3 months) • Design and implement proof of concept BIG Grid - Virtualization working group
Active working group topics • Policies/Security issues for Class 2/3 VMs • Technology study • Managing Virtual Machines • Distributing VM images • Interfacing the VM infrastructure with ‘the grid’ • Identify missing functionality and alternatives • Accounting and fare share, image management, authentication/authorization, etc. BIG Grid - Virtualization working group
The Amazon identity crisis • The three most confronting questions: • What is the difference between a job and a VM? • Why can I do it at Amazon, but not at the grid? • What is the added value of grids over clouds? “We don’t want to compete with Amazon!” BIG Grid - Virtualization working group
Policy and security issues E-science services and functionality • Data integrity, confidentiality and privacy • Non-repudiation of user actions System administrator point of view • Trust user intentions, not their implementations • Incident response more costly than certification • Forensics is time consuming BIG Grid - Virtualization working group
Security 101 = Attack surface Compromised user space is often already enough trouble BIG Grid - Virtualization working group
Available policies Grid Security Policy, version 5.7a VO Portal Policy, version 1.0 (draft) Big Grid Security Policy, version 2009-025 Grid Acceptable Use Policy, version 3.1 Grid Site Operations Policy, version 1.4a LCG/EGEE Incident Handling and Response Guide,version 2.1 Grid Security Traceability and Logging Policy,version 2.0 VO-Box Security Recommendations and Questionnaire, version 0.6 (draft, not ratified) BIG Grid - Virtualization working group
Relevant policy statements Network security is covered by site local security policies and practices A VO Box is part of the trusted network fabric. Privileged access is limited to resource administrators Software deployed in the grid must include sufficient and relevant site central logging. BIG Grid - Virtualization working group
First compromise Certified package repository Base templates Certified packages Separate user disk User specific stuff Permanent storage At run time No privileged access Comparable to VO box Licenses? BIG Grid - Virtualization working group
Second compromise • Make separate grid DMZ for Class 3 VMs • Comparable to “Guest networks” • Only outbound connectivity • Detection of compromised guests • Extended security monitoring • Packet inspection, netflows (SNORT, nfsen) • Honeypots, etc. • Simple policy: one warning, you’re out. • Needs approval (network policy) from OST (Operations Steering Team) BIG Grid - Virtualization working group
Technology study BIG Grid - Virtualization working group
Managing VMs Resource management Torque/PBS OpenNebula Haizea Job queue VM queue Site Box 1 “Normal WN” Box 3 “8 Class 2/3 VMs” Box 2 “8 Virtual WNs” BIG Grid - Virtualization working group
Distributing VM images Repository (SAN) iSCSI/LVM Image Image Image Class 2/3 upload solution Image Image Box 1 “Normal WN” Box 2 “8 Virtual WNs” Box 3 “8 Class 2/3 VMs” BIG Grid - Virtualization working group
Repository Cached copy-on-write Image Box 1 Cache COW Image VM COW Box 2 Cache COW VM VM Image COW VM BIG Grid - Virtualization working group
Interfacing VMs with ‘the grid’ • Grid middleware • globus-job-run • globus-gatekeeper • globus-job-manager • contact-string • jm-pbs-long • jm-opennebula • qsub / opennebula Nimbus/OCCI Repository (SAN) Image Image Image Class 2/3 upload solution Image Image Resource management Torque/PBS OpenNebula Class 2 Class 3 discussion BIG Grid - Virtualization working group
Coffee table discussion VM contact-string Parameter passing issue • User management mapping • Mapping to OpenNebula users • Authentication / Authorization • Access to different VM images • Grid middleware components involved: • Cream-CE, BLAHp, glexec • Execution Environment Service https://edms.cern.ch/document/1018216/1 • Authorization Service Design https://edms.cern.ch/document/944192/1 BIG Grid - Virtualization working group
Monitoring/Performance testing BIG Grid - Virtualization working group
Performance • Small cluster • 4 dual CPU quad core machines • Image server with 2 TB storage • Integration with experimental testbed • Existing Cream-CE / Torque • Testing • Network I/O, is NAT feasible? • File I/O, what is the COW overhead? • Realistic jobs BIG Grid - Virtualization working group
Other challenges • Accounting, scheduling based on Fair Share • Scalability! • Rapidly changing landscape • New projects every week • New versions every month • So many alternatives • VMWare, SGE, Eucalyptus, Enomaly • iSCSI, NFS, GFS, Hadoop • Monitoring and security tools BIG Grid - Virtualization working group
Conclusions • Maintainability: no home grown scripting • Each solution should be part of a product • Validation procedure with each upgrade • Deployment • Gradually move VM functionality in production • Introduce VM worker nodes • Virtual machine endpoint in grid middleware • Test with a few specific Class 2/3 VMs • Scaling and performance tuning BIG Grid - Virtualization working group