550 likes | 562 Views
This part of the Introduction to Clouds series explores virtual machines and the concept of protected contexts. It discusses the layered platforms of OS platforms and the use of virtual machine monitors (VMM) to create isolated contexts for running programs. It also introduces the concept of virtual appliances or VM images and the benefits of virtualization over real machines.
E N D
Intro to Clouds Jeff Chase Dept. of Computer Science Duke University
Part 1 Virtual machines
The story so far: OS platforms • OS platforms let us run programs in contexts. • Contexts are protected/isolated to varying degrees. • The OS platform TCB offers APIs to create and manipulate protected contexts. • It enforces isolation of contexts for running programs. • It governs access to hardware resources. • Classical example: • Unix context: process • Unix TCB: kernel • Unix kernel API: syscalls
The story so far: layered platforms • We can layer “new” platforms on “old” ones. • The outer layer hides the the inner layer, • covering the inner APIs and abstractions, and • replacing them with the model of the new platform. • Example: Android over Linux Android AMS JVM+lib
Native virtual machines (VMs) • Slide a hypervisor underneath the kernel. • New OS/TCB layer: virtual machine monitor (VMM). • Kernel and processes run in a virtual machine (VM). • The VM “looks the same” to the OS as a physical machine. • The VM is a sandboxed/isolated context for an entire OS. • A VMM can run multiple VMs on a shared computer.
guest VM1 guest VM2 guest VM3 guest or tenant VM contexts P1A P2B P3C OS kernel 1 OS kernel 2 OS kernel 3 hypervisor/VMM host
What is a “program” for a VM? VMM/hypervisor is a new layer of OS platform, with a new kind of protected context. What is a program? app app app ??? guest kernel hypervisor/VMM What kind of program do we launch into a VM context? It’s called a virtual appliance or VM image. A VM is called an instance of the image.
V 4.2.9 virtual appliance contains a complete OS system image, with file tree and apps [Graphics are from rPath inc. and VMware inc.]
Motivation: support multiple OS When virtual is better than real
When virtual is better than real everyone plays nicely together [image from virtualbox.org]
The story so far: protected CPU mode Any kind of machine exception transfers control to a registered (trusted) kernel handler running in a protected CPU mode. user mode syscall trap fault fault u-start u-return u-start u-return kernel “top half” kernel mode kernel “bottom half” (interrupt handlers) interrupt return clock interrupt Kernel handler manipulates CPU register context to return to selected user context.
A closer look kernel stack user stack user stack syscall trap fault fault u-return u-return u-start u-start X handler dispatch table u-return kernel stack interrupt return clock interrupt boot
IA/x86 Protection Rings (CPL) CPU Privilege Level (CPL) • Modern CPUs have multiple protected modes. • History: IA/x86 rings (CPL) • Has built in security levels (Rings 0, 1, 2, 3) • Ring 0 – “Kernel mode” (most privileged) • Ring 3 – “User mode” • Unix uses only two modes: • user – untrusted execution • kernel – trusted execution [Fischbach]
Protection Rings • New Intel VT and AMD SVM CPUs introduce new protected modes for VMM hypervisors. • We can think of it as a new inner ring: one ring to bind them all. • Warning: this is an oversimplification: the actual architecture is more complex for backward compatibility. user kernel hypervisor hypervisor guest user
Protection Rings • Computer scientists have drawn these rings since the 1960s. • They represent layering: the outer ring “hides” the interface of the lower ring. • The machine defines the events (exceptions) that transition to higher privilege (inner ring). • Inner rings register handlers to intercept selected events. • But the picture is misleading…. [Fischbach]
Protection Rings • We might just as soon draw it “inside out”. • Now the ring represents power: what the code at that ring can access or modify. • Bigger rings have more power. • Inclusion: bigger rings can see or do anything that the smaller rings can do. • And they can manipulate the state of the rings they contain. • But still misleading: there are multiple ‘instances’ of the weaker rings. user guest hypervisor
Maybe a better picture… There are multiple ‘instances’ of the weaker rings. And powers are nested: an outer ring limits the “sandbox” or scope of the rings it contains.
Post-note • The remaining slides in the section are just more slides to reinforce these concepts. • We didn’t see them in class. • There is more detail in the reading…
Kernel Mode CPU core CPU mode (a field in some status register) indicates whether a machine CPU (core) is running in a user program or in the protected kernel. Some instructions or register accesses are legal only when the CPU (core) is executing in kernel mode. CPU mode transitions to kernel mode only on machine exception events (trap, fault, interrupt), which transfers control to a handler registered by the kernel with the machine at boot time. So only the kernel program chooses what code ever runs in the kernel mode (or so we hope and intend). A kernel handler can read the user register values at the time of the event, and modify them arbitrarily before (optionally) returning to user mode. U/K mode R0 Rn x PC registers
Exceptions: trap, fault, interrupt intentional happens every time unintentional contributing factors trap: system call open, close, read, write, fork, exec, exit, wait, kill, etc. fault invalid or protected address or opcode, page fault, overflow, etc. synchronous caused by an instruction asynchronous caused by some other event “software interrupt” software requests an interrupt to be delivered at a later time interrupt caused by an external event: I/O op completed, clock tick, power fail, etc.
Kernel Stacks and Trap/Fault Handling stack stack stack stack Processes execute user code on a user stack in the user virtual memory in the process virtual address space. System calls and faults run in kernel mode on the process kernel stack. data Kernel code running in P’s process context (i.e., on its kstack) has access to P’s virtual memory. Each process has a second kernel stack in kernel space (VM accessible only to the kernel). syscall dispatch table The syscall handler makes an indirect call through the system call dispatch table to the handler registered for the specific system call.
More on VMs Recent CPUs support additional protected mode(s) for hypervisors. When the hypervisor initializes, it selects some set of event types to intercept, and registers handlers for them. Selected machine events occuring in user mode or kernel mode transfer control to a hypervisor handler. For example, a guest OS kernel accessing device registers may cause the physical machine to invoke the hypervisor to intervene. In addition, the VM architecture has another level of indirection in the MMU page tables: the hypervisor can specify and restrict what parts of physical memory are visible to each guest VM. A guest VM kernel can map to or address a physical memory frame or command device DMA I/O to/from a physical frame if and only if the hypervisor permits it. If any guest VM tries to do anything weird, then the hypervisor regains control and can see or do anything to any part of the physical or virtual machine state before (optionally) restarting the guest VM.
If you are interested… 2.1 The Intel VT-x Extension In order to improve virtualization performance and simplify VMM implementation, Intel has developed VT-x [37], a virtualization extension to the x86 ISA. AMD also provides a similar extension with a different hardware interface called SVM [3]. The simplest method of adapting hardware to support virtualization is to introduce a mechanism for trapping each instruction that accesses privileged state so that emulation can be performed by a VMM. VT-x embraces a more sophisticated approach, inspired by IBM’s interpre tive execution architecture [31], where as many instructions as possible, including most that access privileged state, are executed directly in hardware without any intervention from the VMM. This is possible because hardware maintains a “shadow copy” of privileged state. The motivation for this approach is to increase performance, as traps can be a significant source of overhead. VT-x adopts a design where the CPU is split into two operating modes: VMX root and VMX non-root mode. VMX root mode is generally used to run the VMM and does not change CPU behavior, except to enable access to new instructions for managing VT-x. VMX non-root mode, on the other hand, restricts CPU behavior and is intended for running virtualized guest OSes. Transitions between VMX modes are managed by hardware. When the VMM executes the VMLAUNCH or VMRESUME instruction, hardware performs a VM entry; placing the CPU in VMX non-root mode and executing the guest. Then, when action is required from the VMM, hardware performs a VM exit, placing the CPU back in VMX root mode and jumping to a VMM entry point. Hardware automatically saves and restores most architectural state during both types of transitions. This is accomplished by using buffers in a memory resident data structure called the VM control structure (VMCS). In addition to storing architectural state, the VMCS contains a myriad of configuration parameters that allow the VMM to control execution and specify which type of events should generate VM exits. This gives the VMM considerable flexibility in determining which hardware is exposed to the guest. For example, a VMM could configure the VMCS so that the HLT instruction causes a VM exit or it could allow the guest to halt the CPU. However, some hardware interfaces, such as the interrupt descriptor table (IDT) and privilege modes, are exposed implicitly in VMX non-root mode and never generate VM exits when accessed. Moreover, a guest can manually request a VM exit by using the VMCALL instruction. Virtual memory is perhaps the most difficult hardware feature for a VMM to expose safely. A straw man solution would be to configure the VMCS so that the guest has access to the page table root register, %CR3. However, this would place complete trust in the guest because it would be possible for it to configure the page table to access any physical memory address, including memory that belongs to the VMM. Fortunately, VT-x includes a dedicated hardware mechanism, called the extended page table (EPT), that can enforce memory isolation on guests with direct access to virtual memory. It works by applying a second, underlying, layer of address translation that can only be configured by the VMM. AMD’s SVM includes a similar mechanism to the EPT, referred to as a nested page table (NPT). From Dune: Safe User-level Access to Privileged CPU Features, Belay e.t al., (Stanford), OSDI, October, 2012
VT in a Nutshell • New VM mode bit • Orthogonal to kernel/user mode or rings (CPL) • If VM mode is off • Machine looks just like it always did • If VM bit is on • Machine is running a guest VM • “VMX non-root operation” • Various events cause gated entry into hypervisor • “virtualization intercept” • Hypervisor can control which events cause intercepts • Hypervisor can examine/manipulate guest VM state
There is another motivation for VMs and hypervisors. Application services and computational jobs need access to computing power “on tap”. Virtualization allows the owner of a server to “slice and dice” server resources and allocate the virtual slices out to customers as VMs. The customers can install and manage their own software their own way in their own VMs. That is cloud hosting. Part 2 Services
Services RPC GET (HTTP)
End-to-end application delivery Where is your application? Where is your data? Where is your OS? Cloud and Software-as-a-Service (SaaS) Rapid evolution, no user upgrade, no user data management. Agile/elastic deployment on virtual infrastructure.
Networking endpoint port operations advertise (bind) listen connect (bind) close write/send read/receive channel binding connection node A node B Some IPC mechanisms allow communication across a network. E.g.: sockets using Internet communication protocols (TCP/IP). Each endpoint on a node (host) has a port number. Each node has one or more interfaces, each on at most one network. Each interface may be reachable on its network by one or more names. E.g. an IP address and an (optional) DNS name.
SaaS platform elements container browser “Classical OS” [wiki.eeng.dcu.ie]
Motivation: “Success disaster” [Graphic from Amazon: Mike Culver, Web Scale Computing]
Motivation: “Success disaster” [Graphic from Amazon: Mike Culver, Web Scale Computing]
“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” • US National Institute for Standards and Technology Part 2 http://www.csrc.nist.gov/groups/SNS/cloud-computing/ Virtual Cloud hosting
Cloud > server-based computing • Client/server model (1980s - ) • Now called Software-as-a-Service (SaaS) Client Server(s)
Host/guest model • Service is hosted by a third party. • flexible programming model • cloud APIs for service to allocate/link resources • on-demand: pay as you grow Client Cloud Provider(s) Service Guest Host
IaaS: infrastructure services Deployment of private clouds is growing rapidly w/ open IaaS cloud software. Client Service Platform Hosting performance and isolation is determined by virtualization layer Virtual machines: VMware, KVM, etc. OS VMM Physical
PaaS: platform services Hadoop, grids, batch job services, etc. can also be viewed as PaaS category. Client Service Platform PaaS cloud services define the high-level programming models, e.g., for clusters or specific application classes. OS VMM (optional) Note: can deploy them over IaaS. Physical
Varying workload Fixed system Varying performance
Varying workload Varying system Fixed performance
“Elastic Cloud” Resource Control Varying workload Varying system Target performance
Elastic provisioning Managing Energy and Server Resources in Hosting Centers, SOSP, October 2001.
EC2 The canonical public cloud Virtual Appliance Image
OpenStack, the Cloud Operating System Management Layer That Adds Automation & Control [Anthony Young @ Rackspace]
IaaS Cloud APIs (OpenStack, EC2) • Query of availability zones (i.e. clusters in Eucalyptus) • SSH public key management (add, list, delete) • VM management (start, list, stop, reboot, get console output) • Security group management • Volume and snapshot management (attach, list, detach, create, bundle, delete) • Image management (bundle, upload, register, list, deregister) • IP address management (allocate, associate, list, release)
Competing Cloud Models: PaaS vs. IaaS • Cloud Platform as a Service (PaaS). The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly application hosting environment configurations. • Cloud Infrastructure as a Service (IaaS). The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components (e.g., host firewalls). Amazon Elastic Compute Cloud (EC2) Eucalyptus OpenNebula
Post-note • The remaining slides weren’t discussed. • Some give more info on the various forms of cloud computing following the NIST model. Just understand IaaS and PaaS hosting models. • The “Adaptation” slides deal with resource management: what assurances does the holder of virtual infrastructure have about how much resource it will receive, and how good its performance will (therefore) be? We’ll discuss this more later. • The last slide refers to an advanced cloud project at Duke and RENCI.org, partially funded by NSF Global Environment for Network Innovations (geni.net).
Managing images • “Let a thousand flowers bloom.” • Curated image collections are needed! • “Virtual appliance marketplace”
Infrastructure as a Service (IaaS) “Consumers of IaaS have access to virtual computers, network-accessible storage, network infrastructure components, and other fundamental computing resources…and are billed according to the amount or duration of the resources consumed.”