390 likes | 708 Views
Virtualization in Linux. Atul Bansal Manish Pal Pulkit Gambhir. Virtualization in a nut-shell. Virtualization : Running multiple machines on a single hardware “Real” hardware invisible to OS OS only sees an abstracted out picture Only Virtual Machine Monitor ( VMM ) talks to hardware.
E N D
Virtualization in Linux Atul Bansal Manish Pal Pulkit Gambhir
Virtualization in a nut-shell • Virtualization : Running multiple machines on a single hardware • “Real” hardware invisible to OS • OS only sees an abstracted out picture • Only Virtual Machine Monitor (VMM) talks to hardware
More formally … A framework for dividing the resources of a machine into multiple execution environments by using techniques such as : • H/W & S/W partitioning • Time-sharing • Partial or complete machine simulation • Emulation … In general, we implement an M-N mapping (M:real resources N:virtual resources). Eg. Multitasking (1-N), Cluster Computing (M-1)
Motivations • Save on costs : • Many servers 1 Machine • Running legacy software • Improve on security : • Protection of data by placing on separate virtual machines • Development and debugging platform
More motivations • Hardware independence of code (Java VM) • Compatibility issues tackled • Server migration is eased • Error and attack containment • Dynamic resource sharing • It looks cool too….
Issues in virtualization • Some interfaces not designed with virtualization in mind (Ex. Processor privileges) • VMM needs to call all privileged instructions • Need for extra level of segmentation of memory (between virtual machines) • Done entirely by VMM, guest OS’s only see an abstraction of page table, not the real one
Issues in virtualization • Resource sharing • Map all VM requests to same network card, same DMA controller etc. • Design and management of communication between different virtual machines • Need to show abstracted hardware which has no physical equivalent (emulation)
And despite all that …. • Need transparency and near real machine like performance ! • An extremely hard task (bochs is a very good example of less than perfect performance)
Virtual Architecture Basic virtual Architecture of Xen • CPU state • Exception • Interrupt handling • Time • Memory • Devices
CPU State • Xen provides each guest OS with virtual cpu’s (only 1 real cpu). • All privileged cpu state are handled by Xen. • Guest OSes are not permitted to perform privileged operations. • hypercallsinterface provided to guest OS to execute privileged operations on the cpu through Xen
Hypercalls • Analogous to system calls provided by any OS exceptthat handlers of software interrupts vectors to entry point within Xen. • Even to set up Interrupt Vector Table, the OS must invoke Xen hypercalls. • Basically any priviliged operation on CPU is performed through a hypercall to Xen.
Virtual IDT • A virtual IDT is provided to guest OS for setting up interrupt vector table. • A guest OS can submit a table of trap handlers to Xen via the set_trap_table hypercall. • The exception stack frame presented to a virtual trap handler is identical to its native equivalent.
Interrupt Handling • Interrupts are virtualized by mapping them to event channels • Get delivered to the guest OS using a callback supplied via the set callbacks hypercall. • Guest OS can map these events onto its standard interrupt dispatch mechanisms. • Xen is responsible for determining the guest OS that will handle each physical interrupt source.
Time • Time is important in virtualization as guest OS need to be aware of ‘real time’ and ‘virtual time’ (time of execution). • Xen exports timestamps for system time and wall-clock time to guest operating systems through a shared page of memory.
Time Consistency • All time stamps need to be updated and read atomically . • Xen stores a version number in the shared info page, which is incremented before and after updating the timestamps. • A guest can be sure that it read a consistent state by checking the two version numbers are equal.
Event Channels • Event channels are the basic primitive provided by Xen for event notifications. • Xen equivalent of a hardware interrupt. • Stores one bit of information, the event of interest is signaled by transitioning this bit from 0 to 1. • Notifications are received by a guest via an up-call from Xen,
Event Channels (Implementation) • The kernel shared info page (shared_info_t) contains two bitfields for event channels unsigned long evtchn_pending[…..]; unsigned long evtchn_mask[…..]; • These two specify, respectively, if there is an event pending (evtchn_pending) and if the event channel is masked or not. • For masked channels, no events will be delivered.
Virtual CPU Setup • Any guest OS needs to setup a virtual CPU on which it executes. • Includes installing vector table on virtual IDT for handling interrupts,page faults etc • Guest OS must setup a pair of hypervisor callbacks (notification and entry points for XEN)
Hypercalls for CPU Setup set callbacks(………………………..). The above hypercall allows a guest OS to setup the hypervisor callbacks. set trap table(trap info t *table) The above hypercall allows a guest OS to setup its IDT. A further hypercall is provided for the management of virtual CPUs: vcpu op(……..) This hypercall can be used to bootstrap VCPUs, to bring them up and down and to test their current status.
Start of Day • The start-of-day environment for guest operating systems is different to that provided by the underlying hardware. • Processor is already executing in protected mode with paging enabled. • Domain 0 is created and booted by Xen itself.
Start of Day • For all domains other that dom0 , the analogue of the boot-loader is the domain builder. • Domain builder is a user-space software running in domain 0. • The domain builder is responsible for building the initial page tables for a domain and loading its kernel image at the appropriate virtual address.
XEN Scheduling • Similiar to traditional Linux schedulers that divide CPU time for userland processes, XEN schedules resources between VMs. • It is like context switching between kernels • Xen includes kernel boot time options for scheduling.
Scheduling Algorithms • Atropos • soft real time scheduler • guarantees absolute CPU shares • Round Robin • Characterized by a “quantum” of time • Borrowed Virtual Time • Proportional fair shares of CPU times • “Penalizes” domains that block often • ctx_allow : like the “quantum” above
Scheduling Algorithms • sEDF • Provides weighted CPU sharing • Uses real time algorithms to ensure time guarantees • Uses weights as well as slices and periods for scheduling and sharing
System Calls and Scheduling Some Scheduling System Calls *nice( ) getpriority( ) setpriority( ) sched_getscheduler( ) sched_setscheduler( ) sched_getparam( ) sched_setparam( ) sched_yield( ) sched_get_ priority_min( ) sched_get_ priority_max( ) sched_rr_get_interval( )
Memory management • Xen allocates physical memory to the domains on a page granularity • Domains may receive non-contiguous physical memory. • So xen makes a distinction between machine memory and pseudo-physical memory. • Machine memory refers to the entire amount of memory installed in the machine. • Pseudo-physical memory, on the other hand, is a per-domain abstraction.
Memory management • Xen maintains a globally readable machine-to-physical table • Each domain is also supplied with a physical-to-machine table which performs the inverse mapping. • Architecture dependent code in guest operating systems can then use the two tables to provide the abstraction of pseudo-physical memory.
Page Table Updates • Read-only access given to page tables • Guest OS must explicitly request any modifications (through hypercalls). • Xen validates all such requests and only applies updates that it deems safe • This is necessary to prevent domains from adding arbitrary mappings to their page tables.
Writable Page Tables • Guest OS’s may request writable page tables as well. • Xen must still validate modifications to ensure secure partitioning. • Xen thus traps write attempts to certain memory pages.
Handling the trap • Xen temporarily allows write access to that page while at the same time disconnecting it from the page table that is currently in use. • The newly-updated entries cannot be used by the MMU until Xen revalidates and reconnects the page. • Reconnection occurs automatically later in a number of situations. e.g when the domain is preempted.
Shadow Page Tables • Another type of page table • Guest OS uses a independent copy of page tables • Unknown to the hardware • Xen propagates changes made to the guest's tables to the real ones, and vice versa.
VM assists • Xen provides a number of “assists” for guest memory management . • Hypercall used: vm assist(unsigned int cmd, unsigned int type); • cmd parameter describes the action to be taken • type parameter describes the kind of assist that is being referred to.
Conclusions • Virtualization is a very exciting area • Implementation issues still exist • We are still moving toward real machine like performance • With hardware supported virtualization and multi-core, multi-threaded hardware; things are now looking very bright !
A quote to end it • Would PhD virtualization be when several people get a PhD but only one is doing the work? : JoshTriplett on Xen IRC