500 likes | 712 Views
Virtualization. … from a reliability perspective. Credits. Some slides from E6998 - Virtual Machines Lecture 2 CPU Virtualization Scott Devine VMware, Inc. Agenda. What is it? Overview Classifications Virtualization of CPU Virtualization to improve reliability
E N D
Virtualization … from a reliability perspective Henrik Bærbak Christensen
Credits • Some slides from • E6998 - Virtual MachinesLecture 2CPU Virtualization • Scott Devine • VMware, Inc. Henrik Bærbak Christensen
Agenda • What is it? • Overview • Classifications • Virtualization of CPU • Virtualization to improve reliability • Fault Detection and Removal • Configuration Testing • Distribution Testing • Record-n-Playback • Fault Tolerance • Recovery by snapshots • Lock-step execution Henrik Bærbak Christensen
Classes of Virtualization Henrik Bærbak Christensen
What is it? • vir•tu•al (adj): • existing in essence or effect, though not in actual fact • Example • ScummVM is a program which allows you to run certain classic graphical point-and-click adventure games, provided you already have their data files. The clever part about this: ScummVM just replaces the executables shipped with the games, allowing you to play them on systems for which they were never designed! Henrik Bærbak Christensen
Hardware Processors, devices, memory, etc. Software Built to the given hardware (Instruction Set Architecture, e.g. x86) Built to given OS (App. Programming Interface, e.g. Win XP) OS controls hardware A Physical Machine Henrik Bærbak Christensen
Hardware Abstraction Virtual processor, memory, devices, etc. Virtualization Software Indirection: Decouple hardware and OS Multiplex physical hardware across guest VMs A Virtual Machine Henrik Bærbak Christensen
Why VMs? • Many advantages (often business related) • Utilization of resources • VM1 that idles automatically free resources to VM2 • Of course not the case for physical machines • Isolation • Crash, virus, in VM1 does not propagate to VM2 • Encapsulation • A VM is a file: (OS,Apps,Data,Config, run-time state) • Content distribution: Demos as snapshot • Snapshot provides recovery point • Migration to new physical machines by snapshot/restore • In e.g. server farms Henrik Bærbak Christensen
Why VMs? • More advantages • Maintenance costs • Maintain 100 Linux-based web server machines versus a few big VM platforms • Legacy VMs • Run ancient apps (like DOS or SCUMM-based games) • Create once, run anywhere • No configuration issues • E.g. complex setup of app suite of database, servers, etc. • … and some reliability related issues that we will come back to… Henrik Bærbak Christensen
Classification Henrik Bærbak Christensen
Types of VMs • Smith & Nair 2005 • Two superclasses • Process VM • System VM • Both can be subclassed based upon supporting virtualization of same or different ISA (Instruction Set Architecture). Henrik Bærbak Christensen
Process / System VM Henrik Bærbak Christensen
Process VM • The process VM puts the virtualization border line at the process line. • A application process/program executes in a • Logical address space (assigned/handled by the OS) • Using user-level instructions and registers (CPU user-mode) • Only do IO using OS calls or High-Level Library(HHL) calls • Thus a process VM becomes a virtual OS in which a process may execute. • Example: Different ISA Process VM • JavaVM defines its own ISA (stack-based) as well as normal OS operations Henrik Bærbak Christensen
System VM • The system VMsets the virtualization border at the system or hardware line • A system/OS executes in a • Physical memory space • Using the full ISA of the underlying machine • Interact directly with IO • Thus a system VM becomes a virtual machine in which a system as a whole may execute • Example: Same ISA System VM • VMWare: Executes Guest OS that is running on the x86 ISA. Henrik Bærbak Christensen
Other examples • System VM Different ISA • VirtualPC running Windows on Mac • Classic VMs • Virtual Machine Monitor (VMM) directly on hardware • VMWare ESX • Drivers • Hosted VMs • VMM runs as an application in an OS • VMWare Workstation • Drivers Henrik Bærbak Christensen
Exercise • How would you classify ScummVM? Henrik Bærbak Christensen
How does it work? Just a glimpse Henrik Bærbak Christensen
Definition • [Rosenblum & Garfinkel 2005] • A CPU architecture is virtualizable if it supports the basic VMM technique of direct execution—executing the virtual machine on the real machine, while letting the VMM retain ultimate control of the CPU. • That is, in a ‘same ISA’, execution of a guest OS instructions should be executed directly by the hardware • Why? Performance of course! • Ex. JavaVM poses a performance penalty Henrik Bærbak Christensen
Challenges • The problem is that not all instructions are possible to execute directly! • To see the problem we have to dig a bit into modern CPU architectures • I stopped around Z80 and Motorola 68000 • 80286: • First with protected mode… • Support multitasking, process protection, memory mgt., … Henrik Bærbak Christensen
A few CPU terms • Supervisor mode/Privileged mode: • “An execution mode on some processors which enables execution of all instructions, including privileged instructions. It may also give access to a different address space, to memory management hardware and to other peripherals. This is the mode in which the operating system usually runs.” (Wikipedia) Henrik Bærbak Christensen
Supervisor/User mode • x86 architecture • Ring 0: Kernel mode • Can do anything • OS and device drivers • Ring 3: User mode • Limited Instruction Set • Apps can fail at any time without impact on rest of system! • User applications • User apps must do system call (call OS) to interact with hardware like device drivers... Henrik Bærbak Christensen
Performance • Switching from “user mode” to “kernel mode” is, in most existing systems, very expensive. It has been measured, on the basic request getpid, to cost 1000-1500 cycles on most machines. Henrik Bærbak Christensen
Trapping • What happens if code in user mode executes a privileged instruction? • In computing and operating systems, a trap is a type of synchronous interrupt typically caused by an exceptional condition (e.g. division by zero or invalid memory access) in a user process. A trap usually results in a switch to kernel mode, wherein the operating system performs some action before returning control to the originating process. Henrik Bærbak Christensen
Virtualization Requirements • If VMWare is an app, running in Windows, and it runs the Linux OS “inside”, it follows that • Privileged instructions have to run in user-mode! • VVM: Virtual Machine Monitor • VVM runs in priviledged mode • All ‘inside’ runs in user mode Henrik Bærbak Christensen
VMM and privileged instructions • OK, so… • Linux runs in user mode inside a VMM • Now the Linux kernel disables interrupts (CLI) • Which is certainly a privileged instruction which is not allowed in user mode • So – what can we do about that? • Emulation • Trap-and-Emulate • Binary Translation Henrik Bærbak Christensen
Emulation Example: CPUState static struct { uint32 GPR[16]; uint32 LR; uint32 PC; int IE; int IRQ; } CPUState; void CPU_CLI(void) { CPUState.IE = 0; } void CPU_STI(void) { CPUState.IE = 1; } CLI=Clear Interrupt Flag STI=Set Interrupt Flag IE = Interrupt Enabled • Goal for CPU virtualization techniques • Process normal instructions as fast as possible • Forward privileged instructions to emulation routines Henrik Bærbak Christensen
Instruction Interpretation • Emulate Fetch/Decode/Execute pipeline in software • Positives • Easy to implement • Minimal complexity • Negatives • Slow! Henrik Bærbak Christensen
Example: Virtualizing the Interrupt Flagw/ Instruction Interpreter void CPU_Run(void) { while (1) { inst = Fetch(CPUState.PC); CPUState.PC += 4; switch (inst) { case ADD: CPUState.GPR[rd] = GPR[rn] + GPR[rm]; break; … case CLI: CPU_CLI(); break; case STI: CPU_STI(); break; } if (CPUState.IRQ && CPUState.IE) { CPUState.IE = 0; CPU_Vector(EXC_INT); } } } void CPU_CLI(void) { CPUState.IE = 0; } void CPU_STI(void) { CPUState.IE = 1; } void CPU_Vector(intexc) { CPUState.LR = CPUState.PC; CPUState.PC = disTab[exc]; } Henrik Bærbak Christensen
Trap and Emulate Guest OS + Applications Unprivileged Page Fault Undef Instr vIRQ Virtual Machine Monitor Privileged MMU Emulation CPU Emulation I/O Emulation Henrik Bærbak Christensen
The protocol… • The Linux kernel code contains CLI • As in user mode, the CPU traps and transfer control to the VVM • VVM sets internal flag that interrupts are disabled • Interrupts are now not delivered to the guest OS until it calls the STI • which again traps and the VVM clears the internal flag! • From that time on, interrupts are again delivered. • Trap-and-Emulate: • All user mode instructions execute full speed • Privileged instructions are trapped and emulated… Henrik Bærbak Christensen
Issues with Trap and Emulate • Not all architectures support it • Trap costs may be high • Cf. performance measurements earlier Henrik Bærbak Christensen
Challenges… • The x86 is not virtualizable • i.e. there are privileged instructions that do not trap! • Ex. • POPF = pop CPU flag from stack • Contains the interrupt flag bit and thus can enable/disable interrupts without the VVM being notified! • Require advanced techniques to cope beyond trap-and-emulate… • Para Virtualization • Binary Translation Henrik Bærbak Christensen
Para Virtualization • Para virtualization = replacing non-virtualizable portions of the original instruction set with easily virtualizable and more efficient equivalents. • OS code has to be ported! • User apps run unmodified. • Ex. Disco: Change MIPS interrupt instruction to read/write of special memory location in the VVM • More efficient if handled by Trapping • Iris OS had to be ported… Henrik Bærbak Christensen
Binary Translation • Basic idea: • Read code blocks before the CPU gets them • For each instruction classify • “Ident”: copy directly to translation cache (TC) • “Inline”: replace instruction by inline equivalent instructions and copy these to TC • “Callouts”: replace instruction by call to emulation code in VVM • Let CPU execute contents of translation cache instead of original block • TC: keep the translated block for future execution without any translation Henrik Bærbak Christensen
Basic Blocks Guest Code vPC movebx, eax cli Straight-line code and ebx, ~0xfff Basic Block movebx, cr3 sti ret Control flow Henrik Bærbak Christensen
Binary Translation Guest Code Translation Cache vPC movebx, eax movebx, eax start cli call HANDLE_CLI and ebx, ~0xfff and ebx, ~0xfff movebx, cr3 mov [CO_ARG], ebx sti call HANDLE_CR3 ret call HANDLE_STI jmp HANDLE_RET Henrik Bærbak Christensen
Binary Translation Guest Code Translation Cache vPC movebx, eax movebx, eax start cli mov [CPU_IE], 0 and ebx, ~0xfff and ebx, ~0xfff movebx, cr3 mov [CO_ARG], ebx sti call HANDLE_CR3 ret mov [CPU_IE], 1 test [CPU_IRQ], 1 jne call HANDLE_INTS jmp HANDLE_RET Henrik Bærbak Christensen
CPU future… Henrik Bærbak Christensen
Hypervisor mode • Modern/Future CPU architectures • Make a CPU with a Ring -1 • VVM runs in Ring -1, and can run the OS in Ring 0. • Recent CPUs from Intel and AMD offer x86 virtualization instructions for a hypervisor to control Ring 0 hardware access. […] a guest operating system can run Ring 0 operations natively without affecting other guests or the host OS. Henrik Bærbak Christensen
Virtualization andReliable Architectures? What reliability can we get from using VM techniques? Henrik Bærbak Christensen
The three approaches • Three approaches to dependable systems • Fault avoidance: simply avoid introducing defects! • Fault detection and removal: Find and remove the defects before they cause failures. • Fault tolerance: Ensure that faults does not lead to failures. Henrik Bærbak Christensen
Fault Avoidance • I fail so see any here • Fault avoidance in Sommerville’s perspective classify a set of static techniques • Programming languages, etc. • VVM’s are dynamic techniques… Henrik Bærbak Christensen
Fault Detection and Removal • Again, as in testing the detection aspect is central. How does VVM’s help? • Configuration Testing! • HelpDesk: • Large set of snapshots of typical customer configs • Like Internet Explorer 6 on Windows XP • Just resume this snapshot, reproduce defect and report to development • Configuration suite • Configuration testing on a single tester’s machine • Ex: I tested linux variants of my book’s code on my Win XP VMWare within Ubuntu 9.0.4 installed Henrik Bærbak Christensen
Fault Detection • Record and Playback • http://blogs.vmware.com/workstation/2008/04/enhanced-execut.html • See VMWare demo later Henrik Bærbak Christensen
Fault Detection • Distributed systems are complex • Make a virtual network with a set of virtual machines • Lower bandwidth of network connection • Induce packet loss • Kill machines to find defective behavior in the rest • Test graceful degrading algorithms Henrik Bærbak Christensen
Fault Tolerance • Backward recovery technique • Restore previous state of system and restart • Ex. Hardware/device failure on machine • Migrate last good snapshot to new machine • VMWare also works on migrating running VMs • Capture run-time state of running VM • Hot migration allows full hot standby techniques… • Dynamic VM management • Load balancing by create/destroy VMs • Hardware failure automatically migrate VMs Henrik Bærbak Christensen
Fault Tolerance • Lock-step execution • Primary VM has secondary VM running in shadow operation • If primary fails, the secondary acts as hot standby • http://www.vmware.com/files/pdf/perf-vsphere-fault_tolerance.pdf • See demo later… Henrik Bærbak Christensen
Summary • Virtualization • Isolate Apps+OS from hardware • CPU near techniques to do this fast • Trap-and-emulate & Binary Translation • Dependability relation • Fault Detection • Configurations; Tricky setups; record-playback • Fault Tolerance • Support hot standby Henrik Bærbak Christensen