500 likes | 684 Views
Xen and the Art of Virtualization. University of Cambridge Presenter: Ashish Gupta. Features. An open infrastructure for global distributed computing Run multiple services on a single Xenoserver Envisage running up to 100 per server Secure and accountable execution
E N D
Xen and the Art of Virtualization University of Cambridge Presenter: Ashish Gupta
Features • An open infrastructure for global distributed computing • Run multiple services on a single Xenoserver • Envisage running up to 100 per server • Secure and accountable execution • Strong isolation, logging and auditing • Flexible: low-level execution environment • Economical: execute on commodity hardware • (x86)
Virtualization techniques • Single OS image (Ensim, VServers) • Group user processes into resource container. • Implement new schedulers in the OS to ensure isolation • Hard to retrofit isolation to conventional Oses • Full virtualization (VMware, Connectix, Bochs) • Run full OSes as unmodified guests • The VMM enforces resource isolation • But it’s hard to efficiently virtualize uncooperative architectures
Paravirtualization Goals • Low Virtualization Overhead • Performance Isolation Also(Flexibility) • Support full-featured multi-user multi-application OSes
Para-virtualization – Principles ? • Para-virtualization vs. full-virtualization • Expose guest OS to “real resources” (time, MMU etc.) • Better support time sensitive tasks • Allows guest OS optimizations • Correctness issues • The Downside
Three broad aspects • Memory Management • CPU • Device I/O
Memory Management • The VMWare approach – shadow page tables
Modifications • Paravirtualization obviates the need for shadow page tables • Guest OSes allocate and manage their own page tables HOW ?
Mechanism • Updates to page tables must be passed to Xen for • validation • Updates may be queued and processed in batches • Validation rules (applied to each PTE): • 1. only map a page if owned by the requesting guest OS • 2. only map a page containing PTEs for read-only access • Xen tracks page ownership and current use
Memory Management • The Xen approach
CPU • Efficient because - Four privilege levels • OS – Ring 1, Applications – Ring 3 • Privileged instructions required to be validated and executed by Xen Exceptions • Guest OS registers handlers with Xen • Para-virtualization Unchanged handlers • “fast handlers” for most exceptions, Xen isn’t involved • Page faults – CR2 register read by Xen, so must enter Xen
VM ↔ VMM • Guest OS Xen : Hypercalls • Like system calls • Xen Guest OS : Events • Like UNIX signals
I/O Virtualization • Need to minimize cost of transferring bulk data via Xen • Copying costs time • Copying pollutes caches • Copying requires intermediate memory • Device classes • Net • Disk • Graphics
I/O Virtualization • Use rings of buffer descriptors • Descriptors are small: cheap to copy and validate • Descriptors refer to bulk data • No need to map or copy the data into Xen’s address space • Exception: checking network packet headers prior to TX • Use zero-copy DMA to transfer bulk data between hardware and guest OS • Net TX: DMA packet payload separately from validated packet header • Net RX: Page-flip receive buffers into guest address space
Effect of I/O and OS interaction SPEC INT2000 score CPU Intensive Little I/O and OS interaction SPEC WEB99 180Mb/s TCP traffic Disk read-write on 2GB dataset
Performance Isolation • 4 domains • 2 PostgreSQL, SPECWEB99 workloads • 2 anti-social workloads • Disk bandwidth hog: huge number of small file creations • Fork Bomb • The Bad guys could not kill the Good guys • In Native Linux: Rendered the machine completely unusable !
Denali Isolation Kernel University of Washington
Motivation • Functionality pushed into the network: • Google, IMDB, Hotmail, Amazon, EBay, online banking, …lots! • Major players use dedicated hardware. • Lesser services find that cumbersome, expensive and limiting: • Hardware, rack space, bandwidth • Big deployment barrier for little services.
Virtual hosting • Third-party hardware, with small services multiplexed on machines. • Need the ability to run untrusted code. • Likewise for CDNs for dynamic content.
Goals: • strong security • resource control. • Don’t need: resource sharing. • Conventional OSs do not isolate enough • Spectrum of Ideas ! • #1: OSs with Perf isolation • #2: OSs and sandboxing • #3: Exo- / Micro- kernels • #4: Conventional VMs • #5: Isolation Kernel
Isolation Kernel • Focus here is on • Performance with Scaling and • Isolation/Security • Reconsider the exposed Virtual Architecture • Downside • (Linux port ?)
ISA • Biggest challenge for x86 virtualization: • Ambiguous instruction semantics • No support for ambiguous instructions • Two virtual Instructions • Idle-with-timeout • Terminate execution
Memory Architecture • Simple DOS-like architecture: No virtual MMU • Why ? • TLB Problems on x86 : Hardware mapped: Inflexible • Avoids TLB Flushes • Optional Virtual MMU ?
I/O and Interrupt Model • Simpler interfaces to NIC, Disk, keyboard, console and Timer • Avoid the “chatty” interfaces • Interrupt Model • Physical Interrupts Virtual interrupts • Interrupt Dispatch Model • Delays and batches interrupts for non-running VMs • Timing related interrupts ?? Real time apps, games etc ?
Implementation • Round robin scheduling • Idle-with-timeout compensated with a higher priority for next quantum. • Can use existing compilers (gcc) to generate code • VMs are paged in on demand. VMM always in core
Memory • Virtualized 16MB of physical address space per VM (since no virtual MMU). • Recently they added a virtual MIPS-style virtual MMU, so guest OS can virtualize its apps’ space. Overhead? • Pre-allocated, strided swap space. No sharing, so each VM’s space is contiguous.
Networked IO • Ethernet driver moved from guest OS to Denali. Rest of TCP/IP stack stays. • This suffices for early-demuxing received packets into the appropriate VM. • Virtual packet send/recv is 1 PIO each
Guest OS • Guest OS: currently only a library, with no simulated protection boundary there. • Supports a POSIX subset. • Different from a traditional VM : OS more like a process: single user, single task OS ? • Flexibility ?
Evaluation • Network Latency
TCP, HTTP throughput • TCP: BSD-Linux 607Mb/s Denali-Linux 569Mb/
Fair comparison? • Denali with library kernel compared against BSD: both have one protection boundary • Denali-Linux will have one real and one simulated protection boundary: different ?
Batching • Reduction in context switching frequency
Scalability • In-core regime – constant performance • disk bound regime - problems
Scalability and block size • Internal fragmentation!
Evaluation summary Good performance and scalability due to architectural modifications various techniques Is the lib OS representative of a real OS?