540 likes | 581 Views
Operating System Support for Virtual Machines. Samuel King, George Dunlap, Peter Chen. Content. Introduction VM and VMM Type II VMM UMLinux and UML Bottlenecks & Solutions Performance Conclusion Evaluation. Virtual Machines. Developed 1960’s Multiple VM on single machine
E N D
Operating System Support for Virtual Machines Samuel King, George Dunlap, Peter Chen
Content • Introduction • VM and VMM • Type II VMM • UMLinux and UML • Bottlenecks & Solutions • Performance • Conclusion • Evaluation
Virtual Machines • Developed 1960’s • Multiple VM on single machine • Test applications • Program / debug OS • Simulate networks • Isolate applications • Monitor for intrusions • Inject faults • Resource sharing/hosting
Virtual Machine Monitors • Layer that emulates hardware for an Operating System • The simulated hardware is the Virtual Machine
Types of VMMs Type I VMM Type II VMM • Efficient • Low overhead • Simple VMM • VMM: mediates between host OS & guest-machine
Examples • Type I • IBM VM/370, VMware ESX Server, Xen • Type II • SimOS, User-Mode Linux, UMLinux • Hybrid (physical hardware + Host I/O) • VMware Workstation, VirtualPC
Advantages Designers OS abstractions ~ VM OS signals ~ VM interrupts Virtual timer -> timer interrupt Disable interrupts -> disable signals using a flag to defer signals Users Watch and debug the VM execution from the host Disadvantages Performance 10+ x slower Type II compared to Type I
Memory Exception Example • Existing OS Abstractions and Signals can be used in VM • A guest application attempts to access data that it doesn’t have access to • An invalid memory operation occurs and SIGSEGV signal is thrown • SIGSEGV makes the data available • The data is brought in, transparent to the user
OS Abstraction Code • int main (int ac, char *av[]) • { • struct sigaction sa; • int rc; • char *p = calloc(16384, 4); • int *buffer = (int*)((int)(p + PAGE_SIZE - 1) & ~(PAGE_SIZE-1)); • rc = mprotect(buffer, PAGE_SIZE, PROT_NONE); • if (rc == -1) { • perror("mprotect PROT_NONE"); • exit(EXIT_FAILURE); • } • sa.sa_sigaction = SIGSEGV_handler; • sigemptyset (&sa.sa_mask); • sa.sa_flags = SA_SIGINFO; • if (sigaction (SIGSEGV, &sa, NULL) == -1) { • printf ("errno set to: %d\n", errno); • printf ("Error registering SIGSEGV sigaction.\n"); • exit (EXIT_FAILURE); • } • printf("\nACCESSOR: trying to access %p\n", buffer); • *(int*)buffer = 42; • printf("ACCESSOR: wrote %d\n", *(int*)buffer); • if (*buffer = 42) • printf("MAIN: read %d: success!\n", *(int*)buffer); • return EXIT_SUCCESS; • } void SIGSEGV_handler (int signo, siginfo_t *info, void *context) { printf ("ACCESSOR: segfault at address: %p\n", info->si_addr); sigset_t mask; sigemptyset(&mask); sigaddset(&mask, SIGSEGV); sigprocmask(SIG_UNBLOCK, &mask, 0); printf("FIXER: now fixing %p...\n", info->si_addr); char *p = (char*)((int)info->si_addr & ~(PAGE_SIZE-1)); int rc = mprotect(p, PAGE_SIZE, PROT_READ | PROT_WRITE); if (rc == -1) { perror("mprotect PROT_READ | PROT_WRITE"); exit(EXIT_FAILURE); } printf("ACCESSOR: trying again...\n"); } OUTPUT ACCESSOR: trying to access 0x89ab000 ACCESSOR: segfault at address: 0x89ab000 FIXER: now fixing 0x89ab000... ACCESSOR: trying again... ACCESSOR: wrote 42 MAIN: read 42: success!
Second Classification • VMM interface identical to hardware • IBM VM/370, VMware Server & Workstation • VMM added OS modifications • Signal handlers SimOS, UML, UMLinux • Virtualization drivers Disco, VAX VMM • Microkernels & JVM
UMLinux Single machine process for all guest app Guest apps communicate via Guest OS Faster system calls, network transfers, web-server UML Separate machine process for each app Guest apps communicate via shared memory on host Faster context switches, kernel building UMLinux vs User-Mode Linux (UML)
UMLinux • Performance - guest OS must simulate crossing the top red line. • System call to a library – vertical move • Switch applications – horizontal move
User-Mode Linux • Notice the separate VM instances and separation of guest applications
Goal • Make Type II VMs useable in production • Reduce OH of Type II to that of Type I • Done through extension of host OS • Performance within 2x standalone
Three Switching Bottlenecks • High number of context-switching, to move from guest app to guest OS through VMM • Ensuring address protection, switching guest user and guest kernel space • Numerous memory mapping ops, switching guest applications
1. Guest App. to Kernel Switching • VMM uses ptrace to catch system calls and signals from the guest-machine process. • Creates context switches between the VMM and guest-machine process for the Host OS.
1. Optimization • VMM process functionality >> VMM loadable kernel module • Modify Host OS to give VMM control over the guest-machine process’s system calls and signals
2. Address Protection • Guest-machine process switches between guest user and guest kernel mode • Has to protect access to kernel addresses when switching to user mode • Has to enable access to kernel addresses when switching to kernel mode • This creates a large number of mmaps, reprogramming the page table to switch between R/W and inaccessible
2. Protection using the Current Privilege Level • Ring 0 – used for Host Kernel • Ring 1 – … VM • Ring 2 – … • Ring 3 – user level • Supervisor-only bit in the page table prevents code running in CPU privilege ring 3 from accessing the host operating system’s data.
Standalone Address Protection • Linux incurs little overhead when trapping to the kernel • Segments allow access to all addresses (1 to 1 mapping, logical to local address) • Supervisor-bit on each page table restricts Ring-3 processes from accessing kernel code and data
2. Segmentation Bounds for Address Protection Optimization Linux Solution 1 • Bound guest user mode to 0x70000000 segment • Allow guest kernel access to user range
2. Alternate Optimization • Allow guest OS to occupy range from 0x00000000 to 0xc0000000 • Separate guest kernel and user modes by using page table’s supervisor only bit • Stops guest kernel pages from being run in ring 3 • Runs the guest kernel in ring 1
2. Optimization Comparison Linux Solution 1 Solution 2 Guest kernel can now occupy arbitrary regions instead of only a contiguous block
UML tt/skas3/skas0 Modes Guest Process Layout Guest Process Address Space UML kernel code and data tt skas3 skas0 tt skas3 skas0 UML Host Tracing Thread Process code and data
3. Guest Application Switching • Switching guest process address space requires swapping the current mapping between virtual pages and the VM’s physical memory file. • munmap called for previous process’s virtual address space • mmap called for each virtual page in the next process, as needed on page-faults
user mode kernel mode Costs of Switching ( --- and | ) • Context switching will cause a change in the available memory addresses and in the current privilege level. Process 1 Process 2 Kernel
user mode kernel mode Context Switching intr_entry: (saves entire CPU state) (switches to kernel stack) CS 3204 Fall 2007 intr_exit: (restore entire CPU state) (switch back to user stack) iret Process 1 Process 2 Kernel switch_threads: (in) (saves caller’s state) switch_threads: (out) (restores caller’s state) (kernel stack switch)
Costs of Switching ( --- and | ) • Horizontal switching (between applications) • Expensive! • Invalidate the first process’ mapping (unmap) • Validate the second process’ mapping (map) • Vertical switching (to and from OS) • Saves the CPU state of the application • Make the kernel’s address spaces available
FFFFFFFF Process 1 Activein user mode P1 C0400000 CS 3204 Fall 2007 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(1) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(1) ucode (1) access possible in user mode 0
FFFFFFFF Process 1 Active in kernel mode P1 C0400000 CS 3204 Fall 2007 access requires kernel mode 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(1) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(1) ucode (1) access possible in user mode 0
FFFFFFFF Process 2 Activein kernel mode P2 C0400000 CS 3204 Fall 2007 access requires kernel mode 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(2) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(2) ucode(2) access possible in user mode 0
FFFFFFFF Process 2 Activein user mode P2 C0400000 CS 3204 Fall 2007 1 GB kheap kbss user (2) kdata user (2) free C0000000 kcode user (2) ustack(2) user (1) user (1) user (1) kernel 3 GB kernel used kernel kernel udata(2) ucode(2) access possible in user mode 0
3. Guest App. Switching Solution • Allow a process to have 1024 different address spaces • Each address space is defined by a set of page tables • Host OS is modified to switch between address space definitions using switchguest • switchguest only has to change the pointer to the current first-level page table
Guest OS guest proc a guest proc b Page Table Ptr Page Table Ptr switchguest syscall Host operating system switchguest Example • switchguest has to change the hardware’s page table pointer to the next guess process’s page table inside the Host OS
Performance Testing • Do the three solutions bring the performance of Type II VMM within 2x that of standalone systems? • Test benchmarks: • Null system calls • Switching between guest applications • Transferring data • CPU intensive program • Kernel building • Web-server performance
Testing Setups • Standalone (Host OS) • VMware Workstation (Type I) • UMLinux • With optimization 1 (kernel module) • With optimization 1 & 2 (bounded segment) • With optimization 1, 2, & 3 (address spaces)
Null System Call • Guest App has to switch to guest kernel and then back • First optimization – less calls needed to switch to kernel • Second optimization – switching address protections faster
Switching Apps (Context Switch) • First optimization – less calls needed to switch to kernel • Second optimization – switching address protections faster • Third optimization – additional address spaces makes switching apps faster
Network Transfer • Appears to hit a limit in transferring data across an Ethernet switch using TCP
CPU-Intensive Program (POV-Ray) • Mainly compute-bound • Little interaction with the guest kernel • Little virtualization overhead
Kernel-build • Numerous guest kernel calls • Each call is trapped by VMM and signaled to guest kernel • Second optimization no need to re-map and protect when switching to kernel
Web Server (SPECweb99) • Numerous guest kernel calls • Few application switches
Results • Five successful benchmarks brought the performance within 2x standalone. • One failed benchmark (null system call)
Conclusion from Paper • Type II (UMLinux) VMM can be optimized to perform similar to Type I (VMware) • Type II VMM can perform within 2x standalone systems in production
Recent Work • Renamed FAUmachine • Development on FAUmachine continued through 2004 in Germany at the Univ. Erlangen-Nurnberg • Virtually all research on UMLinux/FAUmachine was conducted by the CoVirt & ReVirt Project at Univ. Michigan (Usage of VMs for security services) • CoVirt project now uses various VMs • Fallen behind UML in performance and popularity
Evaluation • UMLinux with optimizations or UML could be very useful in various commercial and educational situations. • UMLinux - slower than standalone, Type I, and other Type II VMMs, it will not become a leading development or run-time platform in practice. • Type II VMMs may dwarf Type I VMMs, due to similar performance and easier to design.