450 likes | 584 Views
Faucets: Scheduling on Clusters and Across the Grid. Presenter: Sameer Kumar Team: Sanjay Kal é , Sameer Kumar, Sindhura Bandhakavi, Justin Meyer Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu/. Outline.
E N D
Faucets: Scheduling on Clusters and Across the Grid Presenter: Sameer Kumar Team: Sanjay Kalé, Sameer Kumar, Sindhura Bandhakavi, Justin Meyer Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu/ LACSI 2003
Outline • High-level Description • Motivation • Faucets, Cluster Bartering • Adaptive jobs, Adaptive queuing system (AQS) • Demo • Usage and Installation • How to write an adaptive program • Installing and Using the AQS • Adding your cluster to an existing faucets server • Installing a faucets server LACSI 2003
Motivation • Demand for high end compute power, but • Dispersed • Which machine would give me back my results quickest? • Hard to use • Use ssh to login, ftp files, decide queue, create script, submit • Because of the hassle, users just submit same script to same machine even if a better alternative exists • Monitor a running job • Low operational efficiency of existing computing systems LACSI 2003
Solution 1: Faucets • Motivation #1: dispersed, hard to use • Central source of compute power • Users • Providers of compute resources • User account not needed on every resource • Match users and providers • Market economy ? • Cluster bartering • QoS requirements, contracts and bidding systems • GUI or web-based interface • Submission • Monitoring LACSI 2003
Faucets Job Specs Bids Job Specs File Upload File Upload Job Id Job Id Parallel systems need to maximize their efficiency! Cluster Job Submission Cluster Job Monitor Cluster http://charm.cs.uiuc.edu/research/faucets LACSI 2003
Allocate A ! 8 processors Conflict ! Job B Job A Job B 10 processors Job A B Queued Motivation #2: Inefficient Utilization 16 Processor system Current Job Schedulers can have low system utilization ! LACSI 2003
Solution : Adaptive Jobs • Jobs that can shrink or expand the number of processors they are running on at runtime • Improve system utilization and response time • Properties • Min_pe, • related to the memory requirements of the job • Max_pe, • related to speedup • Scheduler can take advantage of this adaptivity LACSI 2003
B Finishes Shrink A Allocate B ! A Expands ! Allocate A ! Min_pe = 8 Max_pe= 16 Job B Job A Job B Max_pe = 10 Min_pe = 1 Job A Two Adaptive Jobs 16 Processor system LACSI 2003
Adaptive Job Scheduler • Maximize system utilization and minimize response time • Scheduling decisions • Shrink existing jobs when a new job arrives • Expand jobs to use all processors when a job finishes • Processor map sent to the job • Bit vector specifying which processors a job is allowed to use • 00011100 (use 3 4 and 5!) • Handles regular (non-adaptive) jobs LACSI 2003
Outline • High-level description • Motivation • Faucets, cluster bartering • Adaptive jobs, adaptive queuing system (AQS) • Demo • Usage and installation • How to write an adaptive program • Installing and using the AQS • Adding your cluster to an existing faucets server • Installing a faucets server LACSI 2003
SystemOverview CLUSTER CLUSTER DAEMON ADAPTIVE Q SYSTEM PE PE PE FAUCETS SERVER GUI CLIENT (or) Web Browser CLUSTER
GUI Client CLUSTER CLUSTER DAEMON ADAPTIVE Q SYSTEM PE PE PE FAUCETS SERVER GUI CLIENT (or) Web Browser (or) Command-line Client CLUSTER
Secure Communication • SSL communication • Certificate for Faucets Server • public key distributed on web page, in code • One certificate for each CD • Future: Globus LACSI 2003
GUI Client • One JAR file • Runs on Win32 platform • Faucets Server Certificate included in code. • GUI client gets CD certificates from CS LACSI 2003
Adaptive Jobs CLUSTER CLUSTER DAEMON LOCAL SCHEDULER PE PE PE FAUCETS SERVER GUI CLIENT (or) Web Browser CLUSTER
Adaptive Application Scheduler AMPI Proc. Map CHARM++ Loadbalancer Converse Adaptive Job Framework • Applications written in AMPI or Charm++ • Scheduler controls the processor map for each job • Processor map is used by the job’s load balancer LACSI 2003
Charm++ • Charm++: object based virtualization • Program written as a large number of objects which can migrate • Number of objects typically much larger than processors • Load-balancer can remap objects • Measurement based load balancing LACSI 2003
Adaptive Charm++ Programs • Charm++ program is adaptive automatically if an adaptive load-balancing strategy is used • Currently CommLB and RandcentLB are adaptive • Compile with +balancer CommLB LACSI 2003
MPI Jobs • How do we make MPI jobs adaptive? • AMPI • AMPI maps the MPI processes to user level threads which can migrate • Each thread is embedded in a charm++ object, thus allowing load balancing and shrink-expand LACSI 2003
Writing Adaptive AMPI Programs • Build AMPI with an adaptive load balancing strategies • Call MPI_MIGRATE() at regular intervals in each MPI process, because it will not listen to the processor map otherwise • Use specific load-balancers LACSI 2003
Processors Shrink Time (s) Expand Time (s) 128 64 0.61 0.50 64 32 0.66 0.54 32 16 0.59 0.46 16 8 0.56 0.49 Shrink Expand Overhead Performance for MD program with 10MB migrated data per processor on NCSA Platinum LACSI 2003
Adaptive Queuing System CLUSTER CLUSTER DAEMON ADAPTIVE Q SYSTEM PE PE PE FAUCETS SERVER GUI CLIENT (or) Web Browser CLUSTER
AQS Features • Multithreaded • Reliable and robust • Tested on Linux clusters at UIUC • Supports most features of standard queuing systems • Has the ability to manage adaptive jobs currently implemented in Charm++ and MPI • For more details check out http://charm.cs.uiuc.edu/research/faucets/faucets.html LACSI 2003
Components • Database • Job scheduler • Compute cluster LACSI 2003
Installing Database • Download latest version of MySql • http://www.mysql.com/ • Install, then: mysql> create database <dbname>; mysql> use <dbname>; mysql> create table jobInfo (id mediumint primary key NOT NULL DEFAULT '0' auto_increment, …..) mysql> grant all on *.* to <user> identified by <passwd>; LACSI 2003
Installing Scheduler • cd charm/net-linux/pgms/scheduler; • make scheduler; make client; • Edit Makefile, put correct path to MySql • Running scheduler as root • su • chown root scheduler; • chmod +s scheduler • ./startScheduler LACSI 2003
Installing Scheduler, contd. • Edit the startScheduler file: • Edit Database to match <dbname> used earlier. • Edit PORT to point to port of the scheduler • Edit DATABASE_HOST DATABASE_USER and DATABASE_PASSWD to point to the database host, user and password • NODELIST points to the nodelist for the scheduler LACSI 2003
Configuring The Cluster • User must have access to the cluster only through the queuing system • Each node runs an rsh daemon • Access to rsh through a restrictive group • Job switches to the rsh group before running the job • only head node can rsh to the other nodes • rsh disabled on the compute nodes • All connections through unix sockets LACSI 2003
Using the AQS locally • frun runs a job interactively • fsub submits a batch job • fkill kills the job • fjobs list the running and queued jobs LACSI 2003
Scheduling Events • When : • Job arrival • Job completion • Job requests change of number of processors • Job suspension • Scheduling Strategy • A plugable component that makes decisions on which jobs to schedule LACSI 2003
Scheduling Strategy Studied • Similar to equipartitioning [N Islam et al] • On job arrival and job completion • All running jobs and the new one are allocated their minimum number of processors • Leftover processors are shared equally subject to each job's maximum processor usage • If it is not possible to allocate the new job its minimum number of processors, it is queued LACSI 2003
1/(λ) (s) Adaptive Jobs Traditional Jobs lf MRT (s) Utilization (%) MRT (s) Utilization (%) 500 68 13 165 9 0.13 200 76 31 185 23 0.32 100 60 233 46 0.65 96 64.5 143 88 396 71 1.0 60 164 92 488 76 1.08 Scheduler Performance Simulation results on 64 processors with mean job execution time of 64.5 sec λ=Arrival Rate, MRT=Mean Response Time Utilization=Processor utilization, Load Factor (lf)=Execution Time*λ LACSI 2003
1/(λ) (s) Adaptive Jobs Traditional Jobs lf MRT (s) Utilization (%) MRT (s) Utilization (%) 500 89 17 109 9 0.12 200 70 29 108 23 0.3 100 68 116 49 0.6 76 60 211 99 303 74 1.0 Experimental Results Experiments on Linux cluster on 64 processors and mean job execution time of 60 sec LACSI 2003
Adding a Cluster to Faucets LACSI 2003
CLUSTER CLUSTER DAEMON LOCAL SCHEDULER PE PE PE FAUCETS SERVER GUI CLIENT (or) Web Browser CLUSTER
Adding new cluster • Prerequisites • Install Charm++ • Install Adaptive Queuing System • Then • Download the faucets software • http://charm.cs.uiuc.edu/ • Compile the cluster daemon (CD) • cd faucets/cd; make • Run the cluster daemon (CD) • cd .. • java cd.ClusterDaemon <central server> <central server port> -p <ClusterDaemon port> <working dir> LACSI 2003
Installing a Faucets Server LACSI 2003
CLUSTER CLUSTER DAEMON LOCAL SCHEDULER PE PE PE FAUCETS SERVER GUI CLIENT (or) Web Browser CLUSTER
Installing a Faucets Server • Install MySQL • create tables • grant permissions • Download JDBC driver • http://mmmysql.sourceforge.net/ • Install CS • download faucets code and unpack • cd faucets/cs; make • Edit faucets/cs/db.properties • cd faucets • java -cp .:/path/to/mm.mysql-2.0.8-bin.jar TheServer LACSI 2003
Installing Appspector • Installation is a little involved • Each application needs a display module written in Java • Contact us if you want to install LACSI 2003
Summary and Future Work • Showed you how to use and install the Charm++/AMPI adaptive job system • Download at http://charm.cs.uiuc.edu/research/faucets • Future • Extend the system to other parallel machines • Eliminate residual processes • Integrate the scheduler with Globus • More comprehensive QoS contracts being developed • Sophisticated bidding schemes for the faucets framewor LACSI 2003