250 likes | 264 Views
This article explains the concept of OS virtualization, its benefits, challenges, and various implementation methods including binary translation, hardware support, and paravirtualization. It also covers memory and I/O virtualization techniques used in virtualized systems.
E N D
Outline Background What is Virtualization? Why would we want it? Why is it hard? How do we do it? Choices
What is Virtualization? OS virtualization Create a platform that emulates a hardware platform and allows multiple instances of an OS to use that platform, as though they have full and exclusive access to the underlying hardware
What is Virtualization? Applications Applications Applications Applications OS 1 OS 2 OS 3 OS 4 Virtualization Platform Hardware
The Problem OS uses kernel mode / user mode to protect the OS. System calls (privileged instructions) generate a trap (software interrupt) that forces a switch to kernel mode Assembly sensitive instructions (I/O, MMU control, etc.) that must only be executed by the kernel
The Problem If our VM now runs in user space, we cannot run sensitive instructions in it, since those must trap to kernel space. We would like such instructions to force a trap into the hypervisor Hypervisor responsible to assist with sensitive instructions
The Problem • Hardware protection rings • Supervisor mode • Can run any instruction (ring 0). • Trusted to not fail, in case of failure the system crashes. • User programs use ring 3 • Hypervisor runs on ring 0, guest OS does not
The Problem On x86, some instructions are sensitive but not privileged Example: POPF Pops data from stack to the EFLAGS register Can be called from all protection rings, behaves differently when not in ring 0 Interrupt flag is part of EFLAGS, only changes on ring 0 Is not privileged (does not trap)
Solution – binary translation Replace problematic calls dynamically Read in code, looking for basic blocks Then inspect basic block to find problematic instructions. If found, replace with VM call (process called binary translation) Then, cache block and execute. Eventually, most basic blocks will be modified and cached, and will run at near native speed. Can force traps on sensitive non-privileged instructions
Solution – VM hardware Systems with intel VT-x or AMD SVM (since 2005) New assembly commands to enter VM mode Hypervisor runs on ring 0 under root mode Guest OS runs in ring 0 under non-root mode Changes are done within VM specific state called VMCS (Virtual Machine Control Structure) Even with VM hardware support binary translation can still be used to improve performance
Implementation Type 1 Hypervisor Type 2 Hypervisor Paravirtualization
Type 1 Hypervisor Runs on “bare metal” Hypervisor is the machine’s kernel Made for servers, includes interface for remote / admin access Examples: Xen, Vmware vSphare, etc.
Type 2 Hypervisor Runs from within a OS. Supports guest OSs above it. VM software must include kernel module Example: Oracle VirtualBox, VMware Player, etc.
Paravirtualization Modify Guest OS so that all calls to non-privileged sensitive instructions are changed to hypervisor calls. Much easier (and more efficient) to modify source code than to emulate hardware instructions (as in binary translation).
Problems with Paravirtualization Paravirtualized systems won’t run on native hardware There are many different paravirtualization systems that use different commands, etc. VMware, Xen, etc. Proposed solution: Modify the OS kernel so that it calls a special set of procedures to execute sensitive instructions (Virtual Machine Interface ) Bare metal – link to library that implement code On VM – link to VM specific library
Memory Virtualization OS tracks mapping of virtual memory pages to physical memory page frames. Builds page tables, then updates paging register (trap). Allow hypervisor to manage page mapping, and use shadow page tables for the VMs
Shadow Page Table • Guest page tables map: Guest VA Guest PA • Shadow tables: Guest VA Host PA.
Nested/extended page tables • Requires hardware support • Two “CR3”s (CR3 and EPTP) • MMU translates each guest mapping level Guest OS Hypervisor Page table VMM SW Host page table Page dir. TLB CPU CR3 EPTP HW
Nested page tables • Guest page table map: Guest VA Guest PA • Nested page table map: Guest PA Host PA
I/O Virtualization Each guest OS holds its own “partition”. Typically implemented as a file or region on disk Hypervisor must convert guest OS address (block #) into physical address in region May convert between storage types. Must deal with DMA (Direct memory access) requests
Question (Moed B 2017) במערכת וירטואלית ישנו hypervisor התומך בshadow page tables. • תאר בקצרה תהליך חיפוש כתובת וירטואלית במערכת • מה היתרון של מערכת כזו על פני מערכת המשתמשת ב brute force?
Question (Moed B 2017) Guest OS Hypervisor Page table VMM SW Shadow page table Page dir. Interrupt & VMM corrects page table. G-CR3 TLB CPU CR3 HW
Question (Moed B 2017) • במערכת כזו ישנו רגיסטר בCPU המצביע על הShadow page table. בהינתן כתובת וירטואלית של לקוח במערכת כזו ישנן 2 אופציות: • טבלאות הshadow ממפות את הכתובת והתהליך זהה למערכת רגילה • הטבלאות shadow אינן ממפות את הכתובת, יתקבל interruptמסוג pagefault שיעביר אותנו לקוד בhypervisor אשר יבדוק האם הכתובת ממופה בטבלאות הלקוח (ע"י משתנה שישמור את כתובת הטבלה הראשית של הלקוח), אם הכתובת לא ממופה נחזיר page faultללקוח, אחרת נמפה את הדף בטבלאות הshadow כדי שימפה ישירות לזכרון המערכת ונחזור למערכת האורחת (כעת חזרנו למקרה (1)
Question (Moed B 2017) Define these pages as not R/W Guest OS Hypervisor Page table VMM SW VM memory layout Page dir. TLB CPU CR3 HW
Question (Moed B 2017) • במערכת המבצעת גישות brute force סימנו את כל דפי המיפוי כnon read non write ולכן על כל גישה יתקבל interrupt. במערכת מסוג shadow page table נקבל interrupt רק על הגישה הראשונה לדף כלשהו, מהרגע שמיפינו אותו המערכת האורחת יכולה להמשיך לעבוד כרגיל