1 / 26

Cellular Disco: resource management using virtual clusters on shared memory multiprocessors

Cellular Disco: resource management using virtual clusters on shared memory multiprocessors. Published in ACM 1999 by K.Govil, D. Teodosiu,Y. Huang, M. Rosenblum. Presenter: Soumya Eachempati. Motivation. Large scale shared-Memory Multiprocessors Large number of CPUs (32-128)

nikkos
Download Presentation

Cellular Disco: resource management using virtual clusters on shared memory multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cellular Disco: resource management using virtual clusters on shared memory multiprocessors Published in ACM 1999 by K.Govil, D. Teodosiu,Y. Huang, M. Rosenblum. Presenter: Soumya Eachempati

  2. Motivation • Large scale shared-Memory Multiprocessors • Large number of CPUs (32-128) • NUMA Architectures • Off-the-shelf OS not scalable • Cannot handle large number of resources • Memory management not optimized for NUMA • No fault containment

  3. Existing Solutions • Hardware partitioning • Provides fault containment • Rigid resource allocation • Low resource utilization • Cannot dynamically adapt to workload • New Operating System • Provides flexibility and efficient resource management. • Considerable effort and time Goal: To exploit hardware resources to the fullest with minimal effort while improving flexibility and fault-tolerance.

  4. Solution: DISCO(VMM) • Virtual Machine monitor • Addresses NUMA awareness issues and scalability Issues not dealt by DISCO: • Hardware fault tolerance/containment • Resource management policies

  5. Cellular DISCO • Approach: Convert Multiprocessor machine into a Virtual Cluster • Advantages: • Inherits the benefits of DISCO • Can support legacy OS transparently • Combines the goodness of H/W Partitioning and new OS. • Provides fault containment • Fine grained resource sharing • Less effort than developing an OS

  6. Cellular DISCO • Internally structured into semi-independent cells. • Much less development effort compared to HIVE • No performance loss - with fault containment. WARRANTED DESIGN DECISION:Code of Cellular DISCO is correct.

  7. Cellular Disco Architecture

  8. Resource Management • Over-commits resources • Gives flexibility to adjust fraction of resources assigned to VM. • Restrictions on resource allocation due to fault containment. • Both CPU and memory load balancing under constraints. • Scalability • Fault containment • Avoid contention • First touch allocation, dynamic migration, replication of hot memory pages

  9. Hardware Virtualization • VM’s interface mimics the underlying H/W. • Virtual Machine Resources (User-defined) • VCPUs, memory, I/O devices(physical) • Physical vs. machine resources(allocated dynamically - priority of VM) • VCPUs - CPUs • Physical - machine pages • VMM intercepts privileged instructions • 3 modes - user & supervisor(guest OS), kernel(VMM). • Supervisor mode all memory accesses are mapped. • Allocates machine memory to back the physical memory. • Pmap and memmap data structure. • Second level software TLB(L2TLB).

  10. Hardware fault containment

  11. Hardware fault containment • VMM - software fault containment. • Cell • Inter-cell communication • Inter-processor RPC • Messages - no need for locking since serialized. • Shared memory for some data structures(pmap, memmap). • Low latency, exactly once semantics • Trusted system software layer - enables us to use shared memory.

  12. Implementation 1: MIPS R10000 • 32-processor SGI Origin 2000 • Piggybacked on IRIX 6.4(Host OS) • Guest OS - IRIX 6.2 • Spawns Cellular DISCO(CD) as a multi-threaded kernel process. • Additional overhead < 2%(time spent in host IRIX) • No fault isolation: IRIX kernel is monolithic • Solution: Some host OS support needed-one copy of host OS per cell.

  13. I/O Request execution • Cellular Disco piggybacked on IRIX kernel

  14. 32 - MIPS R10000

  15. Characteristics of workloads • Database - decision support workload • Pmake - IO intensive workload • Raytrace - CPU intensive • Web - kernel intensive web-server workload.

  16. Virtualization Overheads

  17. Fault-containment Overheads Left bar - single cell config Right bar - 8 cell system.

  18. CPU Management • Load Balancing mechanisms: • Three types of VCPU migrations - Intra-node, Inter-node, Inter-cell. • Intra node - loss of CPU cache affinity • Inter node - cost of copying L2TLB, higher long term cost. • Inter cell - loss of both cache and node affinity, increases fault vulnerability. • Alleviates penalty by replicating pages. • Load balancing policies - idle (local load stealer) and periodic (global redistribution) balancers. • Each CPU has local run queue of VCPUs. • Gang-scheduling • Run all VCPUs of a VM simultaneously.

  19. Load Balancing • Low contention distributed data structure - load tree. • Contention on higher level nodes • List of cells vulnerable to - VCPU. • Heavy loaded - idle balancer not enough • Local periodic balancer for 8 CPU region.

  20. CPU Scheduling and Results • Scheduling - highest-priority gang runnable VCPU that has been waiting. Sends out RPC. • 3 configs: 32- processors. • One VM - 8 VCPUs--8 process raytrace. • 4 VMs • 8 VMs (total of 64 VCPUs). • Pmap migrated only when all VCPUs are migrated out of a cell. • Data pages also migrated for independence

  21. Memory Management • Each cell has its own freelist of pages indexed by the home node. • Page allocation request • Satisfied from local node • Else satisfied from same cell • Else borrowed from another cell • Memory balancing • Low memory threshold for borrowing and lending • Each VM has priority list of lender cells

  22. Memory Paging • Page Replacement • Second-chance FIFO • Avoids double paging overheads. • Tracking used pages • Use annotated OS routines • Page Sharing • Explicit marking of shared pages • Redundant Paging • Avoids by trapping every access to virtual paging disk

  23. Implementation 2: FLASH Simulation • FLASH has hardware fault recovery support • Simulation of FLASH architecture on SimOS • Use Fault injector • Power failure • Link failure • Firmware failure (?) • Results: 100% fault containment

  24. Fault Recovery • Hardware support needed • Determine what resources are operational • Reconfigure the machine to use good resources • Cellular Disco recovery • Step 1: All cells agree on a liveset of nodes • Step 2: Abort RPCs/messages to dead cells • Step 3: Kill VMs dependent on failed cells

  25. Fault-recovery Times • Recovery times higher for larger memory • Requires memory scanning for fault detections

  26. Summary • Virtual Machine Monitor • Flexible Resource Management • Legacy OS support • Cellular Disco • Cells provide fault-containment • Create Virtual Cluster • Need hardware support

More Related