180 likes | 321 Views
DPS: A Distributed Proportional-Share Scheduler in Computing Clusters. Chang-Hao Tsai Kang G. Shin The University of Michigan. Cluster Computing. 10 4 – 10 5 nodes MPI Program # nodes Memory/disk Execution time Space-sharing. Low Resource Utilization. Example: Chess
E N D
DPS: A Distributed Proportional-Share Scheduler in Computing Clusters Chang-Hao Tsai Kang G. Shin The University of Michigan
Cluster Computing • 104 – 105 nodes • MPI Program • # nodes • Memory/disk • Execution time • Space-sharing Tsai and Shin, FeBID '07
Low Resource Utilization • Example: Chess • Multiple sub-programs • Program spec. • 16 nodes • Max instant demand • 12 CPUs Master node Move generation Evaluation Simulation Tsai and Shin, FeBID '07
DPS: Distributed Proportional-Share Scheduling • Decouple specifications • Encapsulate processes with resource containers • Automatic dependency inference • Exchange virtual resources • Fully-distributed • Bank-assisted Tsai and Shin, FeBID '07
Outline • Specifications • Virtualization • DPS scheduling • Implementation • Evaluation • Conclusions Tsai and Shin, FeBID '07
Decoupling Specifications • Program specification Defining program structures • Number of structural nodes • Minimum resource requirements at each node • Resource specification Defining a resource budget • Maximum run-time resource usage Tsai and Shin, FeBID '07
Virtualization • Provides an isolated environment • Performance isolation • Security isolation • Allows transfer of virtual resource • Enables migration • Live migration (Xen) Tsai and Shin, FeBID '07
ParallelProgramA ParallelProgram B ParallelProgramB ParallelProgramA competing adjustable Libraries (MPI) Lib (PVM) Libraries (MPI, PVM) OS (Linux) OS (BSD) OS (Linux/BSD) Hypervisor (Xen) Physical Resources Physical Resources Traditional Computing Node VM Hosting Machine Tsai and Shin, FeBID '07
Node CPU scheduling CPU-share allocation Actual CPUutilization • Scheduled byhypervisor (Xen) • Borrowed VirtualTime (BVT) • Work-conserving A A A: 0.25 0.33 B No extra share available B B: 0.50 0.67 Unassigned 0.25 Tsai and Shin, FeBID '07
Upstream Inference VM 1 Running Blocked Running Unblock VM 2 Running Running Running Tsai and Shin, FeBID '07
Fully-Distributed Share Exchange 0.3 0.05 VM 1 VM 2 VM 3 VM 4 Newshare Initialshare 0.05 0.5 0.25 0.2 0.5 0.8 0.55 0.5 0.5 Actualutilization 0.2 0.4 0.5 0.5 Excessshare 0.3 0.1 0 0 Tsai and Shin, FeBID '07
Bank-Assisted Share Exchange VM 1 VM 2 VM 3 VM 4 ResourceBank Tsai and Shin, FeBID '07
domU domU U U U UIA UIA UIA UIA UIA Implementation Rapid disk image cloning VM live migration dom0 VMRepository Update utilization/Migration request Upstream VM inference DPSDaemon Share exchange VMDirectory DPSd DPSd Scheduling decision Xen Xen Xen Banker Network Tsai and Shin, FeBID '07
Evaluation – Virtualization Overhead Tsai and Shin, FeBID '07
Eval. – Reducing Prog. Response Time Tsai and Shin, FeBID '07
Evaluation – DPS in Action! Tsai and Shin, FeBID '07
Conclusions • Virtualization • Isolates execution environment • Enables resource transfer • DPS • Decouples program spec. from resource spec. • Reduces program response time Tsai and Shin, FeBID '07
Thank you for your attention. Any questions? {chtsai,kgshin}@eecs.umich.edu Tsai and Shin, FeBID '07