360 likes | 452 Views
XEN AND THE ART OF VIRTUALIZATION. P. Barham, B.Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauery, I. Pratt, A. Wareld UCCL SOSP 2003. Paper highlights. A very efficient virtual machine hypervisor Main objectives were Low overhead Scalability Key ideas
E N D
XEN AND THE ART OF VIRTUALIZATION P. Barham, B.Dragovic, K. Fraser, S. Hand,T. Harris, A. Ho, R. Neugebauery, I. Pratt, A. Wareld UCCL SOSP 2003
Paper highlights • A very efficient virtual machine hypervisor • Main objectives were • Low overhead • Scalability • Key ideas • Paravirtualization: faster but requires changes to to the guest OS • Use of x86 protection rings
Virtual machines • Let different operating systems run at the same time on a single computer • Windows, Linux and Mac OS • A real-time OS and a conventional OS • A production OS and a new OS being tested
How it is done • A hypervisor /VM monitor defines two or more virtual machines • Each virtual machine has • Its own virtual CPU • Its own virtual physical memory • Its own virtual disk(s)
Virtual hardware # 2 Actual hardware Virtual hardware # 1 Virtual hardware # 1 CPU CPU CPU CPU Memory Memory Memory Memory Disk Disk Disk The virtualization process Hypervisor
Reminder • In a conventional OS, • Kernel executes in privileged/supervisor mode • Can do virtually everything • User processes execute in user mode • Cannot modify their page tables • Cannot execute privileged instructions
A conventional architecture User process User process User mode System call Privileged mode Kernel
User process User process User process User process VM Kernel VM Kernel Two virtual machines User Mode User Mode Privileged mode Hypervisor
Explanations (II) • Whenever the kernel of a VM issues a privileged instruction, an interrupt occurs • The hypervisor takes control and do the physical equivalent of what the VM attempted to do: • Must convert virtual RAM addresses into physical RAM addresses • Must convert virtual disk block addresses into physical block addresses
Translating a block address That's block v, w of the actual disk Access block x, y of my virtual disk VM kernel Hypervisor Virtual disk Access block v, w of actual disk Actual disk
Handling I/Os • Difficult task because • Wide variety of devices • Some devices may be shared among several VMs • Printers • Shared disk partition • Want to let Linux and Windowsaccess the same files
Virtual Memory Issues • Each VM kernel manages its own memory • Its page tables map program virtual addresses into what it believes to be physical addresses
The dilemma User processA Page 735 of process A is stored in page frame 435 That's page frame 993 of the actual RAM VM kernel Hypervisor
The solution (I) • Address translation must remain fast! • Hypervisor lets each VM kernel manage their own page tables but do not use them • They contain bogus mappings! • It maintains instead its own shadow page tables with the correct mappings • Used to handle TLB misses
Why it works • Most memory accesses go through the TLB • The system can tolerate slower page table updates
The solution (II) • To keep its shadow page tables up to date, hypervisor must track any changes made by the VM kernels • Mark page tables read-only • Each attempt to update then by a VM kernel results in an interrupt
Nastiest Issue • The whole VM approach assumes that a kernel executing in user mode will behave exactly like a kernel executing in privileged mode except that privileged instruction will be trapped • Not true for all architectures! • Intel x86 Pop flags (POPF) instruction • …
The VMWare Solution • Mask the issue through clever software • Dynamic "binary translation" when direct execution of code would not work
The Xen Solution • Presenting a virtual machine abstraction that is “similar but not identical to the underlying hardware” • Paravirtualization • Big advantage is faster performance • Big limitation is need to modify guest operating system
Impact on Guest OS • Had to modify • 2,995 lines of Linux code • 1.36 % of total x86 code base • 4,620 lines of Windows XP code • 0.04 % of total x86 code base
Memory management • Virtual machine exported by the hypervisor is not identical to a physical machine • Share of physical memory of each virtual machine may consist of non-contiguous pages
Xen Tenets • Support for unmodified application binaries is essential • Supporting full multi-application guest OSes is important • Raises guest OS protection issues • Paravirtualization is necessary to achievehigh performance • Bad idea to hide the effect of virtualization from guest OSes
Xen Memory Management • Complicated because x86 TLB • Is hardware-managed • Has no tags identifying process address spaces • Need to flush the TLB at each context switch
Clever Trick • The top 64MB region of each address space is reserved to Xen • Can execute Xen code without changing the page map and flushing the TLB
Guest OS protection issues • Must prevent user applications from altering the guest OS • No good solution if guest OS kernel runs in user mode • Xen takes advantage of the xOS ring architecture
x86 Protection Rings • Concept pioneered by MULTICS • Multiple levels of protection • Level 0 can do everything • Level 1 can interfere with levels 2 and 3 but cannot interfere with level 0 • Level 2 can interfere with level 3 but cannot interfere with level 0 and 1 • Level 3 has no special privileges
With Conventional OSes Rings 1 and 2 are not used User processes Kernel
With Xen Guest OSes run in ring 1 User processes Guest OS Xen
Control transfer (I) • Hypercalls: • Synchronous calls from a domain to the Xen hypervisor • Implemented through a software trap mechanism • Same as conventional system calls
Control transfer (II) • From Xen to domains: • Asynchronous event mechanism • Akin to Unix signals • Small number of events
Data transfer between rings • There is now an additional protection domain between guest OSes and I/O devices • Need a fast mechanism for handling data transfers
Subsystem virtualization • CPU: • Uses the borrowed virtual time scheduling algorithm (BVT) • Time and timers: • Guest domains have access to both virtual time and real time • Virtual address translation: • Xen is only involved in page table updates
Subsystem virtualization • Privileged instructions: • Validated and executed by Xen
Performance Comparison Higher values are better!
Key • L is for native Linux (upper bound) • X is for XenoLinux (Xen + Linux) • V is for VMWare workstation 3.2 + Linux • U is for User-Mode Linux (a port of Linux that runs in user mode on the top of Linux)
Conclusions • Xen is fast! • Similar performances of all four solutions for the SPEC 2000 benchmark (the one on the left) should not surprise: • This benchmark is CPU-bound, makes infrequent I/Os and interacts very little with the OS • OS performance is essentially irrelevant