Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al.

Disco: Running Commodity Operating Systems on Scalable MultiprocessorsEdouard et al. Madhura S Rama

Agenda • Goal and Objective • The Problem • Virtual Machine Monitor • Disco – VMM • Experimental Results • Related Work • Conclusion

Goal and Objective • Extend modern OS to run efficiently on shared memory multiprocessors without large changes to the OS. • Use Disco, a Virtual Machine Monitor that can run multiple copies of (IRIX) OS on a multiprocessor.

Problem • Scalable shared multiprocessors are highly available in the market. • System Software for these machines have trailed behind. • Extensive modifications (like partitioning the system, building single system image, fault containment and ccNUMA management) to the OS is necessary for it to support scalable machines – resource intensive. • High cost and reliability issues

Disco • A prototype designed to run on FLASH (developed at Stanford), an experimental ccNUMA machine. • Disco combines commodity OS not designed for running on SMMP to form a high performance system software base • It is a software layer that is inserted between the hardware and the OS. • Virtualize to run multiple OS concurrently

ccNUMA Architecture • Provides a single memory image – logically belongs to one shared address space • As memory is physically distributed, the access time is not uniform – Non Uniform Memory Access (NUMA) • Variables must be consistent - Cache Coherent (ccNUMA)

Pure Present abstracted hardware Compile code to abstracted hardware Compilation not required if h/w is abstracted properly – binary compatibles are sufficient Interpret code to run on real hardware Efficient Requires 2 privilege levels User mode programs run directly on h/w Privileged instructions are intercepted and translated by the VMM Virtualization

Virtual Machine Monitor • A software layer between the hardware and the OS. • Virtualizes all the resources • Allows multiple OS to coexist • VM’s communicate using distributed protocols • Small piece of code with minimal implementation effort.

Architecture of Disco

Advantages • By running multiple copies of an OS, VMM handles the challenges of ccNUMA machines: • Scalability – only the monitor and the distributed protocols need to scale to the size of the machine • Fault Containment – system s/w failure contained in the VM. Simplicity of monitors makes these tasks easier.

Contd.. • NUMA memory management issues – VMM hides the entire problem from the OS by careful page placement, dynamic page migration and page replication. • Single ccNUMA multiprocessor can run multiple OS concurrently –older versions provides a stable platform and newer versions can be staged in.

Challenges of VMM • Overheads • Execution of Privileged instructions must be emulated by the VMM • I/O devices are virtualized – requests must be intercepted and remapped by the VMM • Code and data of each OS is replicated in the memory of each virtual machine. • File system buffer cache is replicated in each OS

Contd… • Resource Management – VMM makes poor resource management decisions due to lack of information • Communication and Sharing – In a naïve implementation, File Sharing is not possible between different VM’s of the same user. Each VM acts as an independent machine in a network.

Disco Implementation • Runs multiple independent virtual machines concurrently on the same h/w • Processors – Disco emulates all instructions, MMU and traps allowing unmodified OS to run on a VM • Physical Memory – Provides an abstraction of main memory residing in contiguous physical address space starting at 0. • I/O Devices – All I/O devices are virtualized and intercepts all communication to emulate/translate the operation.

Disco Implementation • Small size of code, allows for higher degree of tuning – replicated in all memories • Machine-wide data structures are partitioned such that parts accessed by a single processor are in a memory local to that processor

Virtual CPU’s • Disco emulates the execution of virtual CPU by using direct execution on the real CPU – user applications runs at the speed of h/w • Each virtual CPU contains data structure similar to a process table - contains saved registers and other state info. • Maintains privileged registers and TLB contents for privileged instructions **

Virtual Physical Memory • Maintains physical - (40 bit) machine address mapping. • When OS tries to insert a virtual-physical address mapping in the TLB, Disco emulates and gets the machine address for that physical address. Subsequent accesses have no overhead • Each VM has a pmap –contains one entry for each physical page **

Contd.. • Kernel mode references on MIPS processors access memory and I/O directly - need to relink OS code and data to a mapped address space • MIPS tags each TLB entry with Address space identifiers (ASID) • ASIDs are not virtualized – need to be flushed on VM context switches and not on MMU Context switches • Increased TLB misses – create 2nd level software - TLB **

NUMAness • Cache misses must be satisfied from local memory to avoid latency • Disco implements dynamic page replacement and migration ** • Read-shared pages are replicated and write-shared pages are not • Migration and replication policy driven by cache miss counting • Memmap – contains entry for each real machine memory page. Used during TLB shootdowns

Transparent Page Replication

Virtual I/O Devices • Monitor intercepts all device accesses ** • Single VM accessing a device does not require virtualizing the I/O – only needs to assure exclusivity • Interposition on all DMA requests allows to share disk and memory resources among virtual machines and allows VMs to communicate with each other

Copy-on-write Disks • disk reads can be serviced by monitor and if request size is a multiple of the machine page size, monitor only has to remap machine pages into the VM physical memory address space. ** • pages are read-only and an attempt to modify will generate a copy-on-write fault

Virtual N/W Interface

OS Changes • Minor changes to kernel code and data segment (unique to MIPS architecture) • Disco uses original device drivers • Added code to HAL to pass hints to monitor in physical memory • Request zeroed page, unused memory reclamation • Change in mbuf freelist data structure • Call to bcopy, remap function in HAL

Experimental Results • Targeted to run on FLASH machine. Due to unavailability, simOS used to develop and evaluate Disco. • simOS slowdowns prevented from examining long running workloads. • Using short workloads, issues like CPU and memory overhead, scalability and NUMA memory management issues were studied.

Execution Overhead • experimented on a uniprocessor, once running IRIX directly on the h/w and once using disco running IRIX in a single virtual machine • Overhead ranges from 3% - 16%. • Mainly due to TLB miss.

Memory Overhead • Ran single workload of eight different instances of pmake with six different system configurations • Effective sharing of kernel text and buffer cache limits the memory overheads of multiple VM’s

Scalability • Ran pmake workload under six configurations. • Suffers from high synchronization overheads. • Using a single VM has a high overhead. When increased to 8 VM’s execution time reduced to 60%

NUMA • Performance of UMA machine determines the lower bound for the execution time of NUMA machine • Achieves significant performance improvement by enhancing the memory locality.

Related Work • System software for scalable shared memory machines • Virtual Machine monitors • Other system software structuring techniques • ccNUMA memory management

Conclusion • Develop system software for scalable SMMPs without massive development effort • Experimental results shows that the overhead of virtualization is modest in both processing time and memory footprints • Disco provides simple solution for scalability, reliability and NUMA management issues

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al.

Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al.

Presentation Transcript

Mobile Operating Systems

LEAN OPERATING SYSTEMS

Scalable Many-Core Memory Systems Topic 2 : Emerging Technologies and Hybrid Memories

Balancing Throughput and Latency to Improve Real-Time I/O Service in Commodity Systems

CS 620 Advanced Operating Systems

Scalable Many-Core Memory Systems Topic 3 : Memory Interference and QoS -Aware Memory Systems

Scalable Many-Core Memory Systems Optional Topic 4: Cache Management

Advanced Operating Systems

Operating Systems

Advanced Operating Systems

A Highly Scalable Perfect Hashing Algorithm

Operating Systems (part of CSc213/4)

File Systems

Operating Systems Real-Time Operating Systems

CSCI 6 33 : Advanced Operating Systems Dept. of Computer Science CSU San Marcos

DISTRIBUTED COMPUTING

CS-430: Operating Systems Week 1

Standard Operating and Maintenance Procedures

Shared Memory Multiprocessors

Advanced Operating Systems Lecture notes

Operating Systems The warm fuzzy stuff that makes a PC work…