610 likes | 718 Views
Cellular Disco. Kingshuk Govil, Dan Teodosiu, Yongjang Huang, Mendel Rosenblum. Presented by: Sagnik Bhattacharya. Overview. Problems of current shared memory multiprocessors and our requirements Cellular Disco as a solution architecture prototype hardware-fault containment
E N D
Cellular Disco Kingshuk Govil, Dan Teodosiu, Yongjang Huang, Mendel Rosenblum Presented by: Sagnik Bhattacharya
Overview • Problems of current shared memory multiprocessors and our requirements • Cellular Disco as a solution • architecture • prototype • hardware-fault containment • CPU management • Memory management • statistics • Cellular Disco and ubiquitous environments • Conclusion
Problem • Extending modern Operating systems to run efficiently on shared memory multiprocessors. • Software development has not kept pace with hardware development. • Common operating systems fail beyond 12 processors.
What we need…. • the system should be reliable • it should be scalable • it should be fault-tolerant • it should not take too much of development time or effort.
Traditional approaches • Hardware partitioning - lacks resource sharing, makes physical clusters. • Software-centric approaches : (significant development time and cost) • modify existing OS • develop new OS
Control unit Proc Proc Proc Proc A scenario…. Smart Space (No rebooting necessary)
Solution : Cellular Disco • Extension of previous work - Disco • Uses the concept of Virtual machine monitors • Partitions the multiprocessor system into virtual clusters.
Virtual Machine Monitor OS (Win NT) OS (IRIX 6.2) VM1 VM2 Virtual Machine Virtual Machine µP1 µP2 µP3 µP1 µP3 µP5 µP8 VM1 - µP’s 1,2,3 VM2 - µP’s 1,3,5,8 Virtual Machine Monitor Hardware
VM1 µP1 µP2 µP3 OS (Win NT) I/O request OS (IRIX 6.2) VM2 µP1 µP3 µP5 µP8 VM1 - µP’s 1,2,3 VM2 - µP’s 1,3,5,8 Virtual Machine Monitor
VM1 µP1 µP2 µP3 OS (Win NT) OS (IRIX 6.2) VM2 µP1 µP3 µP5 µP8 Trap I/O request & perform I/O VM1 - µP’s 1,2,3 VM2 - µP’s 1,3,5,8 Virtual Machine Monitor
VM1 µP1 µP2 µP3 OS (Win NT) OS (IRIX 6.2) VM2 µP1 µP3 µP5 µP8 Perform I/O and send interrupt VM1 - µP’s 1,2,3 VM2 - µP’s 1,3,5,8 Virtual Machine Monitor
VM1 µP1 µP2 µP3 OS (Win NT) OS (IRIX 6.2) VM2 µP1 µP3 µP5 µP8 VM1 - µP’s 1,2,3 VM2 - µP’s 1,3,5,8 Virtual Machine Monitor
Issues it addresses • Address scalability • NUMA awareness • Hardware fault-containment • Resource management
Prototype • Runs on a 32-processor SGI-Origin 2000 • Supports shared memory systems based on MIPS R1000 architecture. • The prototype runs piggybacked on IRIX 6.4 • The host OS is made dormant and is only used to invoke some device drivers.
Hardware Virtualization • Physical Resources - visible to a virtual machine • Machine Resources - actual resources; allocated by Cellular Disco • CD operates in the kernel mode of the MIPS processor • CD intercepts all system calls.
Resource Management • CPU management - Each processor maintains its own run queue • Memory Management - Memory borrowing mechanism • Each OS instance is only given as many resources as it can handle. Large applications are split and communications between the parts is established by using the shared-memory regions.
CPU Management • VCPU migration : - Intra node (37 µsec) - Inter node (520 µsec) - Inter Cell (1520 µsec)
Cellular Disco Interconnect VCPU migration Cell Cell Cell VCPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Node Node Node Node Node Node
Cellular Disco Interconnect Intra Node Cell Cell Cell VCPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Node Node Node Node Node Node
Cellular Disco Interconnect Inter Node Cell Cell Cell VCPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Node Node Node Node Node Node
Cellular Disco Interconnect Inter Cell Cell Cell Cell VCPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Node Node Node Node Node Node
CPU Management(contd.) • CPU balancing : Idle Balancer Periodic balancer Load Balancing Scenario
Idle balancer CPU0 CPU1 CPU2 CPU3 (Idle) VC A0 VC A1 Asks VC B0 VC B1 Does this have enough cache affinity to CPU2?
Idle balancer CPU0 CPU1 CPU2 CPU3 (Idle) VC A0 VC A1 Asks VC B0 VC B1 Does this have enough cache affinity to CPU2? NO!!
Idle balancer CPU0 CPU1 CPU2 CPU3 VC B1 VC A0 VC A1 VC B0 VC B1
Periodic Balancer • Does depth-first traversal of the load tree 4 1 3 Traversal 1 0 2 1
Periodic Balancer • Checks difference of 2 siblings, ignores if<2 4 1 3 Traversal 1 0 2 1 Diff=1 Diff=1
Periodic Balancer • If diff>=2 does load balancing if benefit>cost 4 1 3 Traversal Diff=2 1 0 2 1
Gang Scheduling • For all the CPU’s we select the VCPU that is to run on the physical CPU. • The VCPU selected is the highest priority be gang-runnable VCPU • all non-idle VCPU’s of that VM are either • running or, • waiting on run queues of processors running lower-priority VM’s.
Example VM1 VC’s - 1,3,8(idle) Wait Queue µP1 : VC1 VC7 VC5 VM2 VC’s - 2,4,6(idle),7 Priority µP2 : VC2 VC1 VC9 µP3 : VC5 VC3 VC4 VM3 VC’s - 5,9 Currently Executing VCPU
Example VM1 VC’s - 1,3,8 (idle) µP1 : VC1 VC7 VC5 VM2 VC’s - 2,4,6(idle),7 Priority µP2 : VC2 VC1 VC9 µP3 : VC5 VC3 VC4 VM3 VC’s - 5,9 Gang Runnable
Example VM1 VC’s - 1,3,8(idle) New Wait Queue µP1 : VC5 VC7 VC1 VM2 VC’s - 2,4,6(idle),7 Priority µP2 : VC9 VC1 VC2 µP3 : VC5 VC3 VC4 VM3 VC’s - 5,9 New Executing VCPU
Memory Management • Each cell maintains its own freelist, and allocates memory to other cells in it allocation preference list on request(RPC). • Speed - 758 µsec for 4 MB. • A threshold is set for min. amount of local free memory • As far as possible Paging is avoided.
Memory Borrowing • freelist - list of free pages in the cell • allocation preference list - list of cells from which borrowing memory is more beneficial than paging.
Memory Borrowing Freelist sizes 32 MB Lending threshold 16 MB Borrowing threshold Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
Memory Borrowing Freelist sizes 32 MB Lending threshold asks 16 MB Borrowing threshold Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
Memory Borrowing Freelist sizes 32 MB Lending threshold refused 16 MB Borrowing threshold Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
Memory Borrowing Freelist sizes 32 MB Lending threshold cannot ask 16 MB Borrowing threshold Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
Memory Borrowing Freelist sizes asks 32 MB Lending threshold 16 MB Borrowing threshold Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
Memory Borrowing Freelist sizes Gives 4 MB 32 MB Lending threshold 16 MB Borrowing threshold Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
Memory Borrowing Freelist sizes 32 MB Lending threshold 16 MB Borrowing threshold Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
Memory Management (contd.) • Paging : Algo - Second Chance FIFO • Page sharing information by some control data structure • Cellular Disco traps all read and write requests made by the Operating Systems
Second-chance FIFO • A reference bit is added to each page in FIFO scheme • Every time the page is accessed the bit is set to 1 • If the page is selected by FIFO, and the reference bit is 1, then it is set to 0 and another page is looked for. • A page is the target page if it is selected b FIFO and the reference bit is 0
Example Page Table Page Fault 1 Oldest Page FIFO 0 Second Oldest Page RB
Example Page Table Page Fault 0 Oldest Page Second-chance FIFO 0 Second Oldest Page RB
Example Page Table 0 Oldest Page RB
Hardware fault-containment • Failure rate increases with increase in processors. • Internally structured as a set of semi-independent cells. • Failure in one cell does not impact VM’s running in other cells (localization of faults) • Assumption - CD is a trusted software layer
Cellular Structure Fault in one cell does not affect others
Hardware fault-containment (contd.) • Communication modes - Fast inter-processor RPC - Message • Side benefit - Software fault containment, i.e., individual OS crashes do not impact the system.