1 / 32

Running Commodity Operating Systems on Scalable Multiprocessors with Disco Virtual Machine Monitor

This paper explores using Disco, a Virtual Machine Monitor, to efficiently run IRIX OS on shared memory multiprocessors with minimal OS modifications. Disco combines commodity OS to form a high-performance system software layer, virtualizing to run multiple OS concurrently on ccNUMA architecture.

angelaw
Download Presentation

Running Commodity Operating Systems on Scalable Multiprocessors with Disco Virtual Machine Monitor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Disco: Running Commodity Operating Systems on Scalable MultiprocessorsEdouard et al. Madhura S Rama

  2. Agenda • Goal and Objective • The Problem • Virtual Machine Monitor • Disco – VMM • Experimental Results • Related Work • Conclusion

  3. Goal and Objective • Extend modern OS to run efficiently on shared memory multiprocessors without large changes to the OS. • Use Disco, a Virtual Machine Monitor that can run multiple copies of (IRIX) OS on a multiprocessor.

  4. Problem • Scalable shared multiprocessors are highly available in the market. • System Software for these machines have trailed behind. • Extensive modifications (like partitioning the system, building single system image, fault containment and ccNUMA management) to the OS is necessary for it to support scalable machines – resource intensive. • High cost and reliability issues

  5. Disco • A prototype designed to run on FLASH (developed at Stanford), an experimental ccNUMA machine. • Disco combines commodity OS not designed for running on SMMP to form a high performance system software base • It is a software layer that is inserted between the hardware and the OS. • Virtualize to run multiple OS concurrently

  6. ccNUMA Architecture • Provides a single memory image – logically belongs to one shared address space • As memory is physically distributed, the access time is not uniform – Non Uniform Memory Access (NUMA) • Variables must be consistent - Cache Coherent (ccNUMA)

  7. Pure Present abstracted hardware Compile code to abstracted hardware Compilation not required if h/w is abstracted properly – binary compatibles are sufficient Interpret code to run on real hardware Efficient Requires 2 privilege levels User mode programs run directly on h/w Privileged instructions are intercepted and translated by the VMM Virtualization

  8. Virtual Machine Monitor • A software layer between the hardware and the OS. • Virtualizes all the resources • Allows multiple OS to coexist • VM’s communicate using distributed protocols • Small piece of code with minimal implementation effort.

  9. Architecture of Disco

  10. Advantages • By running multiple copies of an OS, VMM handles the challenges of ccNUMA machines: • Scalability – only the monitor and the distributed protocols need to scale to the size of the machine • Fault Containment – system s/w failure contained in the VM. Simplicity of monitors makes these tasks easier.

  11. Contd.. • NUMA memory management issues – VMM hides the entire problem from the OS by careful page placement, dynamic page migration and page replication. • Single ccNUMA multiprocessor can run multiple OS concurrently –older versions provides a stable platform and newer versions can be staged in.

  12. Challenges of VMM • Overheads • Execution of Privileged instructions must be emulated by the VMM • I/O devices are virtualized – requests must be intercepted and remapped by the VMM • Code and data of each OS is replicated in the memory of each virtual machine. • File system buffer cache is replicated in each OS

  13. Contd… • Resource Management – VMM makes poor resource management decisions due to lack of information • Communication and Sharing – In a naïve implementation, File Sharing is not possible between different VM’s of the same user. Each VM acts as an independent machine in a network.

  14. Disco Implementation • Runs multiple independent virtual machines concurrently on the same h/w • Processors – Disco emulates all instructions, MMU and traps allowing unmodified OS to run on a VM • Physical Memory – Provides an abstraction of main memory residing in contiguous physical address space starting at 0. • I/O Devices – All I/O devices are virtualized and intercepts all communication to emulate/translate the operation.

  15. Disco Implementation • Small size of code, allows for higher degree of tuning – replicated in all memories • Machine-wide data structures are partitioned such that parts accessed by a single processor are in a memory local to that processor

  16. Virtual CPU’s • Disco emulates the execution of virtual CPU by using direct execution on the real CPU – user applications runs at the speed of h/w • Each virtual CPU contains data structure similar to a process table - contains saved registers and other state info. • Maintains privileged registers and TLB contents for privileged instructions **

  17. Virtual Physical Memory • Maintains physical - (40 bit) machine address mapping. • When OS tries to insert a virtual-physical address mapping in the TLB, Disco emulates and gets the machine address for that physical address. Subsequent accesses have no overhead • Each VM has a pmap –contains one entry for each physical page **

  18. Contd.. • Kernel mode references on MIPS processors access memory and I/O directly - need to relink OS code and data to a mapped address space • MIPS tags each TLB entry with Address space identifiers (ASID) • ASIDs are not virtualized – need to be flushed on VM context switches and not on MMU Context switches • Increased TLB misses – create 2nd level software - TLB **

  19. NUMAness • Cache misses must be satisfied from local memory to avoid latency • Disco implements dynamic page replacement and migration ** • Read-shared pages are replicated and write-shared pages are not • Migration and replication policy driven by cache miss counting • Memmap – contains entry for each real machine memory page. Used during TLB shootdowns

  20. Transparent Page Replication

  21. Virtual I/O Devices • Monitor intercepts all device accesses ** • Single VM accessing a device does not require virtualizing the I/O – only needs to assure exclusivity • Interposition on all DMA requests allows to share disk and memory resources among virtual machines and allows VMs to communicate with each other

  22. Copy-on-write Disks • disk reads can be serviced by monitor and if request size is a multiple of the machine page size, monitor only has to remap machine pages into the VM physical memory address space. ** • pages are read-only and an attempt to modify will generate a copy-on-write fault

  23. Virtual N/W Interface

  24. OS Changes • Minor changes to kernel code and data segment (unique to MIPS architecture) • Disco uses original device drivers • Added code to HAL to pass hints to monitor in physical memory • Request zeroed page, unused memory reclamation • Change in mbuf freelist data structure • Call to bcopy, remap function in HAL

  25. Experimental Results • Targeted to run on FLASH machine. Due to unavailability, simOS used to develop and evaluate Disco. • simOS slowdowns prevented from examining long running workloads. • Using short workloads, issues like CPU and memory overhead, scalability and NUMA memory management issues were studied.

  26. Execution Overhead • experimented on a uniprocessor, once running IRIX directly on the h/w and once using disco running IRIX in a single virtual machine • Overhead ranges from 3% - 16%. • Mainly due to TLB miss.

  27. Memory Overhead • Ran single workload of eight different instances of pmake with six different system configurations • Effective sharing of kernel text and buffer cache limits the memory overheads of multiple VM’s

  28. Scalability • Ran pmake workload under six configurations. • Suffers from high synchronization overheads. • Using a single VM has a high overhead. When increased to 8 VM’s execution time reduced to 60%

  29. NUMA • Performance of UMA machine determines the lower bound for the execution time of NUMA machine • Achieves significant performance improvement by enhancing the memory locality.

  30. Related Work • System software for scalable shared memory machines • Virtual Machine monitors • Other system software structuring techniques • ccNUMA memory management

  31. Conclusion • Develop system software for scalable SMMPs without massive development effort • Experimental results shows that the overhead of virtualization is modest in both processing time and memory footprints • Disco provides simple solution for scalability, reliability and NUMA management issues

More Related