290 likes | 452 Views
Decentralized Dynamic Scheduling across Heterogeneous Multi-core Desktop Grids. Jaehwan Lee , Pete Keleher, Alan Sussman Department of Computer Science University of Maryland. Multi-core is not enough. Multi-core CPU is the current trend of desktop computing
E N D
Decentralized Dynamic Scheduling across Heterogeneous Multi-core Desktop Grids Jaehwan Lee, Pete Keleher, Alan Sussman Department of Computer Science University of Maryland
Multi-core is not enough • Multi-core CPU is the current trend of desktop computing • Not easy to exploit Multi-core in a single machine for high throughput computing • “Multicore is Bad news for Supercomputers”, S. Moore, IEEE Spectrum, 2008 • We have proposed decentralized solution for initial job placement for Multi-core Grids, but..
Motivation and Challenges • Why does dynamic scheduling need? • Stale load information from neighbors • Unpredictable job completion times can change the current load situation • Probabilistic Initial job assignment can be improved • Challenges for decentralized dynamic scheduling for multi-core grids • Multiple Resource Requirement • Decentralized Algorithm • No Job starvation
Our Contribution • New Decentralized Dynamic Scheduling Schemes for Multi-core Grids • Local scheduling • Internode scheduling • Queue Balancing • Experimental Result via extensive simulation • Performance competitive with an online centralized scheduler
Outline • Background • Related work • Our approach • Experimental Results • Conclusion & Future Work
Owner Node Initiate Matchmaking Heartbeat Route Job J Send Job J Find Heartbeat Insert Job J Run Node Injection Node J FIFO Job Queue Overall System Architecture • P2P grids Job J Peer-to-Peer Network (DHT - CAN) Clients Assign GUID to Job J
Matchmaking Mechanism in CAN Memory A D G Run Node J Pushing Job J FIFO Queue B E H Client Heartbeat Job J CPU >= CJ && Memory >= MJ Job J Insert J C F I Owner MJ CJ CPU
Job 1 Job 1 CPU: 2GHz Mem:2GB CPU: 2GHz Mem:2GB Approach for Multi-core nodes (Lee:IPDPS’10) Memory Free B’ Residue- node 1GB • Two logical nodes can represent a multi-core node • Max-node : Maximum & Static status • Residue-node : Current & Dynamic status • Resource management & matchmaking • Dual-CAN : Two distinct nodes consist of each separate CAN • Balloon Model : Residue-node can be represent as a simple structure on top of existing CAN 2GHz CPU Memory Primary CAN B A 3GB Secondary CAN Free Residue-node 1.5GHz 2GHz CPU 2GB B’ 1GB B A 2GB Max-node Qlen=1 Qlen=1 Max-node 3GB 2GHz 1.5GHz CPU Balloon Model Memory
Outline • Background • Related work • Our approach • Experimental Results • Conclusion & Future Work
Backfilling • Concept • Assumption • Job running time should be given • Conservative Backfilling : NOT to delay all other waiting jobs • EASY Backfilling : NOT to delay the first job in the queue • Inaccurate job running time estimation is closely related to the overall performance CPUs Job2 Job4 Job1 Job3 Time
Previous approach under multiple resource requirements • Backfilling under multiple resource requirement (Leinberger:SC’99) • Backfilling in single machine • Heuristic approaches : Backfill Lowest(BL), Backfill Balanced(BB) • Assumption cannot be applicable to our system • Job running time is not given • Job migration to balance K-resources between nodes (Leinberger:HCW’00) • Reduce local load imbalance by exchanging jobs, but not consider overall system loads • No backfilling scheme • Assumption : near-homogeneous environment
Outline • Background • Related work • Our approach • Experimental Results • Conclusion & Future Work
Dynamic Scheduling • After Initial Job assignment..(but before the job starts running) • Periodical dynamic scheduling is invoked • Costs for dynamic scheduling • Communication Cost • Nothing : Local scheduling • Minimum : Internode scheduling & Queue balancing • Job profile is small • CPU cost : Nothing • No preemptive scheduling : Once a job starts running, it won’t be stopped due to dynamic scheduling.
Local Scheduling • Extension of Backfilling under multiple resource requirement • Backfilling Counter (BC) • Every Job profile keeps this value, with initial value 0 • The number of other jobs which have bypassed the job in the waiting queue • Only a job with a BC is equal to or greater than BC of the job at the head of the queue can be backfilled • No job starvation or long waiting time J3 Running Job JR Backfilling BC Head of Queue 0 J1 0 J2 J3 0 J4 0 Queue
Which job should be backfilled? • If multiple jobs in the queue can be backfilled, choose the best job for better utilization • Backfill Balanced (BB) (Leinberger:SC’99) algorithm • Choose the job with minimum objective function (= BM x FM ) • Balance Measure (BM) • : normalized utilization for resource • : Job ’s normalized requirement for resource • Minimize unevenness across utilization of multiple resources • Fullness Measure (FM) • Maximize average utilization
Internode Scheduling Running Job JB Running Job JA BC BC J1 J1 0 2 • Extension of Local scheduling across nodes • Node Backfilling Counter (NBC) • BC of the job at the head of the node’s waiting queue • Only jobs whose BC is equal to or greater than NBC of the target node can be migrated • No job starvation or long waiting time J2 J2 0 2 J3 J3 0 0 Queue Queue Running Job JR J3 J4 NBC : 2 NBC : 0 BC J1 1 J2 1 J4 0 Queue
Internode Scheduling – PUSH vs. PULL • PUSH • A job sender initiates the process • Sender tries to match every job in the queue with residual free resources in its neighbors • If a job can be sent to multiple nodes, Pick the job with the objective function to prefer a node with the fastest CPU • PULL • A job receiver initiates the process • Receiver sends a PULL-Requestmessage to the potential sender (with maximum queue length) • Potential sender checks whether it has a job to be backfilled and the job satisfies BC condition. • If multiple jobs can be sent, choose the job with minimum objective function (= BMx FM ) • If no job can be found, send a PULL-Reject message to receiver. • The receiver continues to look for another potential sender among neighbors
Queue Balancing • Local scheduling & Internode scheduling • find the job which can start running immediately to use current residual resources • Proactive job migration for queue (load) balancing • Migrated job does not have to start immediately • (Normalized) Load of a node (Leinberger:HCW’00) • “Load” for multiple resources can be defined as follows • For each resource, sum all job’s requirements in the queue • Load of a node can be defined as a maximum of the above sums • PUSH & PULL schemes are available • Minimize total local loads (= sum of loads of neighbors, TLL) • Minimize the maximum local load among neighbors (MLL)
Outline • Background • Related work • Our approach • Experimental Results • Conclusion & Future Work
Experimental Setup • Event-driven Simulations • A set of nodes and events • 1000 initial nodes and 5000 job submissions • Jobs are submitted with a Poisson distribution (various interval T) • A node has 1,2,4 or 8 cores. • Job run time follows uniform distribution (30mins~90mins) • Node Capability (Job Requirements) • CPU, Memory, Disk and the number of cores • Job requirement for a resource can be omitted (Don’t care) • Job Constraint Ratio : The probability (Ratio) that each resource type for a job is specified • Steady stateexperiments
Experimental Setup • Performance metrics Matchmaking Cost Wait Time Running Time Job Turn-around Time Injected into the system Arrives at Owner Arrives at Run Node Starts Execution Finishes Execution
Comparison Models • CentralizedScheduler(CENT) • Online and global scheduling mechanism with a single waiting queue • Not feasible in a complete implementation of P2P system • Combination of our approach • Vanilla : No dynamic scheduling (Initial job assignment only) • L : Local scheduling only • LI : Local scheduling + Internode scheduling • LIQ : All three methods (Local scheduling + Internode scheduling + Queue balancing) • LI(Q)-PUSH/PULL : LI & LIQ has PUSH (PULL) options Job J CPU 2.0GHz Mem 500MB Disk 1GB Centralized Scheduler
Performance varying system loads • LIQ-PULL > LI-PULL > LIQ-PUSH > LI-PUSH > L >= Vanilla • Internode scheduling has a major role • PULL is better than PUSH • In overloaded system, PULL is better to spread information due to aggressive trial for a job (Demers:PODC’87) • Local scheduling cannot guarantee for better performance than Vanilla • Our Backfilling Counter cannot ensure not to delay other waiting jobs (different from conservative backfilling)
Overheads • PULL has higher cost than PUSH • Active search (lots of trials and errors ) • Other schemes are similar to Vanilla • Our schemes do not incur significant additional overhead
Performance varying Job Constraint Ratio • LIQ-PULL : best • LIQ == LI • LIQ-PULL is competitive to CENT • For 80% Job Constraint Ratio, • LIQ-PULL performance gets worse compared to other ratio cases • Reason: difficult to find a good (capable) neighbor for job migration because jobs get more highly constrained
Evaluation Summary • Performance • LIQ-PULL is competitive to CENT • Internode Scheduling has major impacts on performance • PULL is better than PUSH (more aggressive search) • Performance shows good consistently regardless of variant system load and job constraint ratio environment • Overheads • PULL > PUSH (more aggressive search) • Competitive to Vanilla
Conclusion and Future Work • New decentralized Dynamic Scheduling for Multi-core P2P Grids • Local Scheduling : Backfilling in a single node • Internode scheduling : Backfilling across neighbors • Queue Balancing : Proactive Queue(Load) adjustment among neighbors • Evaluation via simulation • LIQ-PULL is the best • Competitive performance to CENT • PULL is better than PUSH, but PULL has higher cost • Low overhead (similar to vanilla ) • Future work • Real experiment (co-operation with Astronomy Dept.) • Decentralized Resource Management for HeterogeneousAsymmetric Multi-processors
Decentralized Dynamic Scheduling across Heterogeneous Multi-core Desktop Grids Jaehwan Lee, Pete Keleher, Alan Sussman Department of Computer Science University of Maryland