Disco

Disco Running Commodity Operating Systems on Scalable Multiprocessors Paper by Edouard Bugnion, Scott Devine, and Mendel Rosenblum Presented by Petar Bujosevic 05/17/2005

Introduction • More scalable systems on the market • System software trailing hardware • Development resource intensive • Idea: insert an additional layer of software between OS and HW • FLASH microprocessor on ccNUMA • Multiple copies of commodity OSes across the layer

Problem Description • Innovative hardware (scalable shared memory multiprocessors) • Requires significant changes to system software to support hardware advantages • High cost, large system SW requires long development time, powerful SW companies • HW vs. SW – ”Impediment to innovation” • Challenges: Overhead, Resource management, Sharing/Communication

Virtual Machine Monitors Independent stand-alone systems that simply happened to be sharing same hardware • Run operating systems efficiently on scalable multi-processor systems • Insert additional layer of software between HW and OS • Reduce overhead associated with layer • Small implementation effort with no major changes to the OS • Virtual machines as units of HW fault containment • Monitor handles all the NUMA related issues so that UMA OSes do not need to be made aware of non-uniformity • Challenges • Overhead - due to memory replication in each VM • Resource management - decisions w/out high-level knowledge • Communication and sharing - interoperating in distributed env.

Disco Architecture • Virtual machine is assigned resources by Disco which manages a pool of processing elements/memory resources • Decouple Operating System from machine hardware. • OS runs on virtual machine

Disco Implementation • Disco emulates the MMU and the trap architecture, allowing unmodified applications and OSes to run on the VM • Frequently used kernel operations can be optimized. For instance interrupt disabling is done by the OSes by load and storing to special addresses • All I/O devices are virtualized, including network connections and disks, and all access to them must pass through Disco to be translated or emulated.

Disco Implementation Managing resources • Virtual CPUs • Virtual Physical Memory • Advanced Hardware (NUMA) • Virtual I/O devices • Virtual Network interfaces

Virtual CPUs • Schedules virtual machine/CPU as task • Sets registers to virtual machine registers and runs the task directly • Controlled (supervised) access to memory

Virtual Physical Memory • Disco maintains a physical-to-machine address mapping. • machine addresses are FLASH’s 40 bit addresses

Virtual Physical Memory • When a heavy weight OS tries to update the TLB, Disco steps in and applies the physical-to-machine translation. Subsequent memory accesses then can go straight thru the TLB • Each VM has an associated pmap in the monitor • pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB

Virtual Physical Memory • MIPS has a tagged TLB, called address space identifier (ASID). • ASIDs are not virtualized, so TLB must be flushed on VM context switches • 2nd level software TLB?

NUMA Management • Cache misses are served faster from local memory rather than remote memory • Read and read-shared pages are migrated to all nodes that frequently access them • Write-shared are not, since maintaining consistency requires remote access anyway • Migration and replacement policy is driven by cache miss counting

NUMA Management • memmap tracks which virtual page references each physical page. Used during TLB shootdown

Virtual I/O Devices • all device accesses are intercepted by the monitor • disk reads can be serviced by monitor and if request size is a multiple of the machine page size, monitor only has to remap machine pages into the VM physical memory address space. • pages are read-only and will generate a copy-on-write fault if written to

Virtual Network Interface • Communication between virtual machines by accessing data in shared cache • Avoid duplication of data • Use sharing whenever possible • Affects data locality Transparent Sharing of Pages over NFS

IRIX, HAL changes • Minor changes to kernel code and data segment (unique to MIPS architecture) • Disco uses original device drivers • Added code to HAL to pass hints to monitor in physical memory • Request zeroed page, unused memory reclamation • Change in mbuf freelist data structure • Call to bcopy, remap function in HAL

SPLASHOS • Thin OS, supported directly by Disco (no need for virtual memory subsystem) • Used for parallel scientific applications

Experiments • Setup and Workloads • Execution Overheads • Memory Overheads • Scalability • Dynamic Page Migration and Replication

Related Work • System Software for Scalable Shared Memory Machines • Virtual Machine Monitors • Other System Software Structuring Techniques • ccNUMA Memory Management

Conclusion • Developing system software for scalable shared memory multiprocessors without huge development effort • Adding a layer level between commodity OSes and raw HW • Disco resolves problems of traditional virtual machines • Global buffer cache transparently shared across all virtual machines • Low / modest overhead • Scalability and reliability • Low implementation cost

Deficiencies • Hardware failure analysis • Larger vs. smaller number of processors • Virtual Physical Memory on architectures other than MIPS

References • Disco: Running Commodity Operating Systems on Scalable Multiprocessors, by Edouard Bugnion, Scott Devine, and Mendel Rosenblum, 1997 • Modern Operating Systems, Second Edition, Andrew S. Tanenbaum, 2001 • http://www-flash.stanford.edu/Disco • http://www.cs.pdx.edu/~walpole/class/cs533/slides/151.ppt, Jeremy Greenwald, 2005 • http://www.core.org.cn/OcwWeb/Electrical-Engineering-and-Computer-Science/6-828Fall2003/LectureNotes/detail/virtual_machines-.htm • http://www.cs.wisc.edu/~dusseau/Classes/CS736/CS736-S02/ReadingQuestions/Disco.html • http://www.cs.northwestern.edu/ ~fabianb/classes/cs-443-s05/Disco.pps • http://www.cs.washington.edu/sosp16/ • http://www.cs.berkeley.edu/~zf/cs262a/summary34.htm • http://en.wikipedia.org/wiki/Microkernel

Disco

Disco

Presentation Transcript

DISCO UDDI

Disco

DISCO

Disco Inferno

DISCO System

DISCO DUDES

HALLOWEEN DISCO

Disco

DISCO

Disco Biscuits

Disco

Disco

DISCO