240 likes | 260 Views
Globus Virtual Workspaces. HEPiX Fall 2007, St Louis Kate Keahey Argonne National Laboratory University of Chicago keahey@mcs.anl.gov. Why Virtual Workspaces?. Quality of Service We get: batch-style provisioning One size fits all Side-effect of job scheduling
E N D
Globus Virtual Workspaces HEPiX Fall 2007, St Louis Kate Keahey Argonne National Laboratory University of Chicago keahey@mcs.anl.gov
Why Virtual Workspaces? • Quality of Service • We get: batch-style provisioning • One size fits all • Side-effect of job scheduling • We need: advance reservations, urgent computing, periodic, best-effort, and others • Separation of job scheduling and resource management • E.g. workflow-based apps and batch apps have different needs • Quality of Life • We have: “I have a 100 nodes I cannot use” • Complex applications • Hard to install • Require validation • Separation of environment preparation and resources leasing Virtual Workspaces: http://workspace.globus.org
What are Virtual Workspaces? • A dynamically provisioned environment • Environment definition: we get exactly the (software) environment we need on demand. • Resource allocation: Provision the resources the workspace needs (CPUs, memory, disk, bandwidth, availability), allowing for dynamic renegotiation to reflect changing requirements and conditions. • Implementation • Traditional means: publishing, automated configuration, coarse-grained enforcement • Virtual Machines: encapsulated configuration and fine-grained enforcement Paper: “Virtual Workspaces: Achieving Quality of Service and Quality of Life in the Grid” Virtual Workspaces: http://workspace.globus.org
1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U L X V U L X V U SPEC INT2000 (score) Linux build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score) Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U) Virtual Machines (Xen) • Open source • Paravirtualization • The Good: high-performance • The Bad: difficult to run proprietary OSs, and to mix 32-bit and 64-bit kernels (VT needed) • Xen terminology: • Domain0 (the host), • DomainU (user domain, the guest) Virtual Workspaces: http://workspace.globus.org
Deploying WorkspacesRemotely Pool node Pool node Pool node VWS Service Pool node Pool node Pool node Workspace • Workspace metadata • Pointer to the image • Logistics information • Deployment request • CPU, memory, node count, etc. Pool node Pool node Pool node Pool node Pool node Pool node Virtual Workspaces: http://workspace.globus.org
Interacting with Workspaces The workspace service publishes information on each workspace as standard WSRF Resource Properties. Pool node Pool node Pool node VWS Service Pool node Pool node Pool node Users can query those properties to find out information about their workspace (e.g. what IP the workspace was bound to) Pool node Pool node Pool node Pool node Pool node Pool node Users can interact directly with their workspaces the same way the would with a physical machine. Trusted Computing Base (TCB) Virtual Workspaces: http://workspace.globus.org
Workspace Service Components Workspace WSRF front-end that allows clients to deploy and manage virtual workspaces VWS Service Pool node Pool node Pool node Workspace back-end: Pool node Pool node Pool node Resource manager for a pool of physical nodes Deploys and manages Workspaces on the nodes Pool node Pool node Pool node Each node must have a VMM (Xen) installed, as well as the workspace control program that manages individual nodes Pool node Pool node Pool node Contextualization creates a common context for a virtual cluster Trusted Computing Base (TCB) Virtual Workspaces: http://workspace.globus.org
Workspace Service Components • GT4 WSRF front-end • Leverages GT core and services, notifications, security, etc. • Follows the OGF WS-Agreement provisioning model • Publishes available lease terms • Provides lease descriptions • Workspace Resource Manager (back-end) • Currently focused on Xen • Works with multiple Resource Managers • Workspace Control • Contextualization • Put the virtual appliance in its deployment context • Current release 1.3, available at: • http://workspace.globus.org Virtual Workspaces: http://workspace.globus.org
Workspace Resource Managers • Default resource manager (basic slot fitting) • Commercial datacenter technology would also fit • Amazon Elastic Compute Cloud (EC2) • EC2: Selling cycles as Xen VMs • Software similar to Workspace Service • No virtual clusters, contextualization, fine-grain allocations, etc. • Grid credential admission -> EC2 charging model • STAR: 100 node VM run Virtual Workspaces: http://workspace.globus.org
Virtual Workspaces for STAR • STAR image configuration • A virtual cluster composed of an OSG headnode and STAR worker nodes • Using the workspace service over EC2 to provision resources • Allocations of up to 100 nodes • Dynamically contextualized for out-of-the-box cluster Virtual Workspaces: http://workspace.globus.org
Workspace Resource Managers • Default resource manager (basic slot fitting) • Commercial datacenter technology would also fit • Amazon Elastic Compute Cloud (EC2) • EC2: Selling cycles as Xen VMs • Software similar to Workspace Service • No virtual clusters, contextualization, fine-grain allocations, etc. • Grid credential admission -> EC2 charging model • STAR: 100 node VM run • Workspace Pilot • Integrating VMs into current provisioning models • Long-term solutions • Interleaving soft and hard leases • Providing better articulated leasing models • Developed in the context of existing schedulers Virtual Workspaces: http://workspace.globus.org
Providing Resources: The Workspace Pilot • Challenge: find the simplest way to integrate VMs into current provisioning models • Glide-ins (Condor): poor man’s resource leasing • Best-effort semantics: submit a job “pilot” that claims resources but does not run a job • The Workspace Pilot • Resources booted to dom0 • Pilot adjusts memory • VWS leases “slots” to VMs • Kill-all facility Virtual Workspaces: http://workspace.globus.org
Workspace Control • VM control • Starting, stopping etc. • To be replaced by Xen API • Integrating into the network • Assigning MAC addresses and IP addresses • DHCP Delivery tool • Building up a trusted networking layer • VM image propagation • Image management and reconstruction • creating blank partitions • Talks to the workspace service via ssh Virtual Workspaces: http://workspace.globus.org
Security Issues • Secure admission of appliances/workspaces • The appliance vendor configures the appliance, asserts its properties and signs them to the appliance • Security and other updates, configuration and versioning assertions, disallowing offsite root access, etc. • The appliance deployer validates the signature and matches the assertions to policies • SC05 Poster: “Making your workspace secure: establishing trust with VMs in the Grid” • Secure networking • Controlling spoofing • Isolating networks between different VM groups • Traffic monitoring Virtual Workspaces: http://workspace.globus.org
So -- you’ve deployed some VMs… Now what? • Do they have public IP addresses? • Do they actually represent something useful? • I need an OSG cluster: • How do the VMs find out about each other? • Can they share storage? • Do they have host certificates? • And gridmapfile? • And all the other things that will integrate them into my VO? Virtual Workspaces: http://workspace.globus.org
Virtual Clusters • Challenge: what is a virtual cluster? • A more complex virtual machine • Networking, shared storage, etc. that will be portable across sites and implementations • Available at the same time and sharing a common context • Example: • A set of worker nodes with some edge services in front and NFS-based shared storage • Solution: management of ensembles and sharing • Ensemble deployment, EPR management • Flexible, configurable cluster deployment • Networking • Edge Services have public IPs • Worker nodes are on a private network shared with the Edge Services • Exporting and sharing a common context • Configuring and joining context Paper: “Virtual Clusters for Grid Communities”, CCGrid 2006 Virtual Workspaces: http://workspace.globus.org
Contextualization • Challenge: Putting a VM in the deployment context of the Grid, site, and other VMs • Assigning and sharing IP addresses, name resolution, application-level configuration, etc. • Solution: Management of Common Context • Configuration-dependent • provides&requires • Common understanding between the image “vendor” and deployer • Mechanisms for securely delivering the required information to images across different implementations contextualization agent Common Context IP hostname pk Paper: “A Scalable Approach To Deploying And Managing Appliances”, TeraGrid conference 2007 Virtual Workspaces: http://workspace.globus.org
Where Do VM Images Come From? • Appliance providers • Appliance providers configure, manage, attest images • Contextualization: collaboration between appliance vendors and appliance deployers • Appliance providers • rPath • Recipe-style configuration (create a project, choose packages, “cook”, build the software appliance_ • Freely available online, many appliances • http://www.rpath.com/rbuilder/ • Bcfg2 • Incrementally constructed configuration profiles • Configuration analysis capabilities • http://trac.mcs.anl.gov/projects/bcfg2 Virtual Workspaces: http://workspace.globus.org
Customization Layer Application Layer VO Layer System Layer Image Management • Image partitions • Efficiency • Security • Flexibility • Partition management on deployment • Partition caching and generation • Partition sharing • Mounting Virtual Workspaces: http://workspace.globus.org
Appliance Providers: OSFarm, rPath, CohesiveFT, bcfg2, etc. marketplaces of all kinds Virtual Organizations: configuration, attestation, maintenance Resource Providers: Local clusters, Grid resource providers (TeraGrid, OSG) Commercial providers: EC2, Sun, slicehost, Provisioning a resource, not a platform Middleware: appliances --> resources manage appliance deployment Combining networks and storage VWS EC2 In-Vigo Workspace Ecosystem Virtual Workspaces: http://workspace.globus.org
Parting Thoughts • VMs are the raw materials from which a working system can be built • But we still have to build it! • Technical challenges: taking one step at a time • Social/procedural challenges • Division of labor • Resource providers • Appliance providers • Can we build trust between these two groups? • If you have a specific problem, give us a call: • http://workspace.globus.org • In our copious spare time we also do research • Migration, fine-grained enforcement, resource management, load balancing, migration in time, lots of one-offs… • VTDC07 (co-located with SC07) Virtual Workspaces: http://workspace.globus.org
Acknowledgements • Workspace team: • Kate Keahey • Tim Freeman • Borja Sotomayor • Funding • NSF SDCI “Missing Links” • NSF CSR “Virtual Playgrounds” • DOE CEDPS Project • With thanks to many collaborators: • Jerome Lauret (STAR, BNL), Doug Olson (STAR, LBNL), Marty Wesley (rPath), Stu Gott (rPath), Ken Van Dine (rPath), Predrag Buncic (Alice, CERN), Haavard Bjerke (CERN), Rick Bradshaw (Bcfg2, ANL), Narayan Desai (Bcfg2, ANL), Duncan Penfold-Brown (Atlas,uvic), Ian Gable (Atlas, uvic), David Grundy (Atlas, uvic), Ti Leggit (University of Chicago), Greg Cross (University of Chicago), Mike Papka (University of Chicago/ANL) Virtual Workspaces: http://workspace.globus.org
with thanks to Jerome Lauret and Doug Olson of the STAR project Running jobs : 0 Running jobs : 42 Running jobs : 73 Running jobs : 94 Running jobs : 124 Running jobs : 142 Running jobs : 150 Running jobs : 150 Running jobs : 109 Running jobs : 230 VWS/EC2 BNL Running jobs : 0 Running jobs : 300 Running jobs : 76 Running jobs : 300 Running jobs : 195 Running jobs : 221 Running jobs : 243 Running jobs : 282 Running jobs : 300 Running jobs : 140 WSU Fermi Running jobs : 54 Running jobs : 150 Running jobs : 0 Running jobs : 96 Running jobs : 37 Running jobs : 136 Running jobs : 183 Running jobs : 195 Running jobs : 200 Running jobs : 152 Running jobs : 27 Running jobs : 50 Running jobs : 42 Running jobs : 39 Running jobs : 34 Running jobs : 0 Running jobs : 15 Running jobs : 9 Running jobs : 50 Running jobs : 21 PDSF Job Completion : File Recovery : Virtual Workspaces: http://workspace.globus.org
with thanks to Jerome Lauret and Doug Olson of the STAR project with thanks to Jerome Lauret and Doug Olson of the STAR project Nersc PDSF EC2 (via Workspace Service) WSU Accelerated display of a workflow job state Y = job number, X = job state Virtual Workspaces: http://workspace.globus.org