180 likes | 329 Views
Resource management system for distributed environment. B4. Nguyen Tuan Duc. Background. Emerging need for resource management system of clusters / grids Several systems exist, but have problems… Portable Batch System Sun Grid Engine …. Goal. Flexible resource management system
E N D
Resource management system for distributed environment B4. Nguyen Tuan Duc
Background • Emerging need for resource management system of clusters / grids • Several systems exist, but have problems… • Portable Batch System • Sun Grid Engine • ….
Goal • Flexible resource management system • Support clusters, grids • Fair-share scheduling • Maximize utilization of resources • Support parallel applications • Reduce load aggregation
Agenda • Background • Goal • Related works • Proposal method • Problems
Related works • Portable Batch System (MRJ 1990s) • Batch queuing system • Automatic load-balancing • Parallel jobs support • Job accounting
Sun Grid Engine • Batch queuing system by Sun Microsystems • Same features with PBS, and • Job checkpoint • Several add-ons
Problems of batch queuing systems • Resource utilization • Load aggregation • Server accept too many requests from clients • Limit of execution model • Cannot fork, since process created with fork() does not go into the queue • …
Saito Dai’s system (STDS) • Flexible Resource Management System for Widely Distributed Environment (2006) • No load aggregation • Job scheduling on each node • Independent from execution model (fork, … OK) • Support parallel jobs
STDS structure • Two main components • Node searching system (graph searching) • Scheduler (on each node) • Scheduler • Daemon on each node • CPU fair-sharing by ‘nice’ • Node searching system • Create graph from links • Node search graph search
Our approach • Similar to STD system • Node searching system • Scheduler on each node • But different in … • Node search: no graph searching • Scheduler: kernel scheduler with user accounting (budget scheduler)
Scheduler: Budget scheduling • Budget scheduling • Normal queue & budget queue • Normal queue for interactive processes • Linux 2.6 default scheduler • Budget queue for CPU-hogging processes • Automatic detecting of CPU-intensive process • http://www.logos.ic.i.u-tokyo.ac.jp/~duc/pre/1107.ppt
Node searching system • Client-server model • Daemon on each node • Daemon reports CPU state (process number, CPU utilization, …) directly to user • Reports maximum price • From where user can submit jobs? • From every where on the cluster, grids • From their desktop, via the Internet Need of a job submitting system
Who will determine nodes? • User! • Users choose nodes appropriated to their jobs • Parallel jobs: idle CPUs or CPUs with low-price jobs • Long-last jobs: idle CPU, set low-price
Node searching system (NSS) • NSS should report to users: • CPU utilization • Maximum price • Load (process number, ..) • … • Daemon on each node sends information about the node to client. • Client is on user’s machine No heavy load aggregation
Problems!!! • May be heavy load on user client • NAT, Firewall • How client can connect to server?? • Information need? • Only CPU utilization, maximum price, load, average-price?