340 likes | 410 Views
Job Scheduling for the BlueGene/L System Elie Krevat, Jose G.Castanos, Jose E.Moreira Presented by Savitha Krishnamoorthy CIS 888 The Ohio State University. Motivation. Problems associated with toroidal interconnects: Require rectangular,contiguous job partitions
E N D
Job Scheduling for the BlueGene/L SystemElie Krevat, Jose G.Castanos, Jose E.MoreiraPresented bySavitha KrishnamoorthyCIS 888The Ohio State University
Motivation Problems associated with toroidal interconnects: • Require rectangular,contiguous job partitions • Introduce fragmentation issues- affect utilization,wait time • Lead to slow down
Toroidal Interconnects • “Endless” connection • Simple, modular, scalable • Examples: Cray T3D, T3E m/c • Problems: • Nodes not fully connected,not equidistant • Spatial location of nodes while allocating jobs - critical • Fragmentation due to rectangular, contiguous partitions
A 2D Torus Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8
Schemes analyzed • Space-sharing scheduling techniques • Backfilling • Moves low priority job (FCFS) ahead • No delay to high priority job • Migration • On-the-fly defragmentation
FCFS Scheduler • Maximize largest free rectangular partition remaining in Torus • Invoked whenever job arrives/ terminates • Rectangles requiring prime number of nodes can’t be found • Simplest Algorithm
FCFS with Backfilling • System utilization • Estimation of job execution time required • But we know – overestimating execution time doesn’t affect backfilling • Invoked when waiting queue not empty+FCFS scheduler halted
FCFS+Migration • Rearrange running jobs, contiguous rectangular free partition • Empty torus -> Reschedule • Decision Metrics: • FNtor =Free Nodes:Torus Size • FNmax= Fraction of free nodes in maximal free partition
Backfilling + Migration • Schedule via FCFS first • Rearrange torus through migration, minimize fragmentation • repeat FCFS • Finally backfill
The BG/L System • 32x32x64 3D torus of cells (nodes) • Processor, mem, links to 6 neighbors in each cell • Unit of job allocation 8x8x8 config • Each unit is a supernode • BG/L- a 4x4x8 torus of supernodes
The Simulation Environment • Simulator input: Job log(arrival time,execution time,size of job), type of scheduler (FCFS,B,M,B+M) • 4 Primary events: • Arrival:when job first submitted and placed in scheduler’s waiting queue • Schedule:when job allocated onto torus • Start:Job begins to run(?why 1 second) • Finish:when job completes & is deallocated
Metrics • Torus size N • Arrival time of job j=taj • Execution time = tej • Size of job = sj • Start time = tsj • Finish time = tfj
Parameters • Wait time: twj = tsj – taj • Response time: trj = tfj – taj • Bounded slowdown: • Bound used as some jobs skew slowdown due to very short exe times
Parameters contd… • System Utilization: T is the make span • Total unused capacity: f(t) = free nodes at time t q(t) = total number of nodes requested at t Measure of work unused due to lack of jobs
Parameters contd… • The product T*N – Maximum utilization of the system • Balance of the system capacity, considered lost
Workload characteristics • Experiments performed on 10000-job span of 2 job logs • NASA Ames 128 node iPSC/860 • SDSC 128-node machines
Performing Migration • Recall… • Parameters to determine attempting a migration- FNtor and FNmax • FNtor = Free nodes:Size of Torus • FNmax = Free nodes in maximal free partition:Free nodes • Migration attempted when: • FNtor >= 0.1; FNmax <= 0.7
Comments…+,- • Compared the schedule when applied fully connected topologies • Studying effect of fragmentation on util,wait time and slowdown • How the schedule affected utilization • Could have given an Average job wait time statistics for each scheduler • Fragmentation important distinction • Could have compared capacity unused, using fully connected system as ideal
Advantage of parameters • Frequency of migration attempts • Avg benefits of successful migrn • Comparison of job wait times with: • Scheduler that uses the parameters • Scheduler that always migrates
POP Algorithm • Projection of Partitions • Solves problem of finding largest free rectangular partition • Exhaustive search M9 for MxMxM Torus • POP is O(M5)
Basic Algorithm • Given a base location from M3, find largest partition first in 1 dimension • Project adjacent dimension, find largest partition in 2D • Projects adjacent 2D planes, find largest partition in 3D
The Algorithm • FREEPART = {<B,S>|B=base location (i,j,k); S=partition size (a,b,c), s.t x,y,z i<=x<(i+a), j<=y<(j+b), k<=z<(z+c), Node(x%M,y%M,z%M) is free • Largest 1D partitions PFREEPART pre-computed for all 3 Ds in O(M4) time(Every possible base location)