Priority Scheduling: An Application for the Permutahedron

Priority Scheduling: An Application for the Permutahedron Ethan Bolker UMass-Boston BMC Software AMS Toronto meeting September 24, 2000

Plan • Brief introduction to queueing theory • Priority scheduling • Conservation laws and the permutahedron • Specifying CPU shares interesting pictures and open questions References: www.cs.umb.edu/~eb/goalmode Acknowledgements: Jeff Buzen, Yiping Ding, Dan Keefe, Oliver Chen, Aaron Ball, Tom Larard

Queueing theory • Workload: stream of jobs visiting a server (ATM, time shared CPU, printer, …) • Jobs queue when server is busy • Input: • Arrival rate:  job/sec • Service demand: s sec/job • Performance metrics: • Utilization: u = s (must be  1) • Response time: r = ??? • Degradation: d = r/s • Queue length: q = r (Little’s law)

Response time computations • r, d, q measure queueing delay r  s (d  1), unless parallel processing possible • Randomness really matters r = s (d = 1) if arrivals scheduled (best case, no waiting) r >> s for bulk arrivals (worst case, maximum delays) • Theorem. d = 1/(1- u) if arrivals are Poisson and service is exponentially distributed (M/M/1).  r = s/(1- u) (think virtual server with speed 1-u )  q = u/(1- u) (convention: job in service is on queue)

M/M/1 • Essential nonlinearity often counterintuitive • at u = 90% average queue length is 0.9/(1-0.9) = 9, • average response time is s/(1-0.9) = 10s, • but 1 customer in 10 has no wait at all (10% idle time) • A useful guide even when hypotheses fail • accurate enough ( 20%) for real computer systems • d depends only on u: many small jobs have same impact as few large jobs • faster system  smaller s  smaller u r = s/(1-u)  double win: less service, less wait • waiting costly, server cheap (telephones): want u  0 • server costly (doctors): want u  1 but scheduled

Multiple Job Streams • Multiple workloads, utilizations u1, u2, … • U =  ui < 1 All degradations equal: di = 1/(1-U) • Suppose priority scheduling possible Study degradation vector V = (d1, d2, …)

Priority Scheduling • Priority state: order workloads by priority (ties OK) • two workloads, 3 states: 12, 21, [12] • three workloads, 13 states: • 123 (6 = 3! of these ordered states), • [12]3 (3 of these), • 1[23] (3 of these), • [123] (1 state with no priorities) • n wkls, f(n) states, n! ordered (simplex lock combos) • p(s) = prob( state = s ) = fraction of time in state s • V(s) = degradation vector when state = s (measure this, or compute it using queueing theory) • V = s p(s)V(s) (time avg is convex combination) • Achievable region is convex hull of vectors V(s)

Two workloads d1 = d2 d2 V(12) (wkl 1 high prio)  V([12]) (no priorities)  achievable region  V(21) d1

Two workloads d1 = d2 d2 V(12) (wkl 1 high prio)  V([12]) (no priorities)   0.5 V(12) + 0.5V(21)  V([12])  V(21) d1

Two workloads d1 = d2 d2 V(12) (wkl 1 high prio)  V([12]) (no priorities)  note: u1 < u2  wkl 2 effect on wkl 1 large  V(21) d1

Conservation • No Free Lunch Theorem. Weighted average degradation is constant, independent of priority scheduling scheme: i (ui /U) di = 1/(1-U) • Provable from some hypotheses • Observable in some real systems • Sometimes false: shortest job first minimizes average response time (printer queues, supermarket express checkout lines)

Conservation • For any proper set A of workloads Imagine giving those workloads top priority. Then can pretend other wkls don’t exist. In that case i  A (ui /U(A)) di= 1/(1-U(A)) When wkls in A have lower priorities they have higher degradations, so in general i  A (ui /U(A)) di 1/(1-U(A)) • These 2n -2 linear inequalities determine the convex achievable regionR • R is a permutahedron: only n! vertices

Two workload permutahedron d2 u1d1 + u2d2 = U/(1-U) d1

Two workload permutahedron d2 u1d1 + u2d2 = U/(1-U) d2  1/(1- u2 )  V(21) d1

Two workload permutahedron d2 V(12)  achievable region u1d1 + u2d2 = U/(1-U) d2  1/(1- u2 )  V(21) d1  1/(1- u1 ) d1

Three workload permutahedron d3 u1d1 + u2d2 + u3d3 = U/(1-U) V(123)   V(213) d2 d1

Experimental evidence

Four workload permutahedron 4! = 24 vertices (ordered states) 24 - 2 = 14 facets (proper subsets) (conservation constraints) 74 faces (states) Simplicial geometry and transportation polytopes, Trans. Amer. Math. Soc. 217 (1976) 138.

Scheduling for performance • Administrator specifies performance goals • desired degradations (IBM OS/390) (not today) • CPU shares (UNIX offerings from HP, IBM, Sun) • Operating system dispatches jobs in an attempt to meet goals • Model predicts degradations by constructing map workload performance goals permutahedron

Specifying CPU shares • Administrator specifies workload CPU shares • Share f (0 < f < 1) means workload guaranteed fraction f of CPU when at least one of its jobs is queued for service, can get more if some competition is absent • share  utilization • share  cap • share should be renamed guarantee

Map shares to degradations- two workloads - • Suppose f1 and f2 > 0 , f1 + f2 = 1 • Model: System operates in state • 12 with probability f1 • 21 with probability f2 (independent of who is on queue) • Average degradation vector: V = f1 V(12) + f2 V(21)

Model validation

Map shares to degradations- three (n) workloads - f1 f2 f3 prob(123) = ------------------------------ (f1 + f2 +f3) (f2 +f3) (f3) • Theorem: These n! probabilities sum to 1 • interesting identity generalizing adding fractions • prove by induction, or by coupon collecting • V = ordered states s prob(s) V(s) • O(n!), (n!), good enough for n  9 (12) • Searching for fast (approximate) algorithm ...

Model validation

Map shares to degradations(geometry) • Interpret shares as barycentric coordinates in the n-1 simplex • Study the geometry of the map from the simplex to the n-1 dimensional permutahedron • Easy when n=2: each is a line segment and map is linear

Mapping a triangle to a hexagon f3 = 1 f1 = 0 312 132 f1 = 1  M f3 = 0 321 123 wkl 1 high priority 213 231 wkl 1 low priority

Mapping a triangle to a hexagon f1 = 0 f1 = 1  {23}

Mapping a triangle to a hexagon

Implementing fair share scheduling • Actual Sun/solaris implementation is subtle • HP and IBM are black boxes (for me) • Stochastic solution: randomly choose queued job to dispatch (implement the model rather than model an implementation) • May require prior computation of priodist(w, p) = prob(wkl w runs at prio p) • workload priority probabilities, not state probabilities

Priority distributions • Given degradations, compute a priodist • A priodist is an nn matrix with row sums 1 • {priodists} = cartesian product of n n-simplices • Map is surjective, not injective • Look for a well behaved inverse image priodist space (dim n(n-1)) permutahedron (dim n-1)

Three workload permutahedron d2 d1 = d2 [13]2 312 132 3[12] 1[23] [123] 321 123 [23]1 [12]3 231 d2 = d3 213 2[13] d1 d1 = d3

… dissected into 3! quadrilaterals d2 d1 = d2 1[23] [123] 123 [12]3 d2 = d3 d1

… each mapped to from a skew quadrilateral of priodists 1 0 0 0 .5 .5 0 .5 .5 .33 .33 .33 .33 .33 .33 .33 .33 .33 P[123] P1[23] 1[23] [123]  (x,y) P123 P[12]3 123 [12]3 .5 .5 0 .5 .5 0 0 0 1 1 0 0 0 1 0 0 0 1 (x,y)  xyP123 + x(1-y) P1[23] + (1-x)yP[12]3 + (1-x)(1-y) P[123] degradation vector in this corner of permutahedron

Skew quadrilaterals • Given 4 points P00, P01, P10, P11 Rm , map unit square: (x,y)  xyP00 + x(1-y) P01+ (1-x)yP10 + (1-x)(1-y) P11 • Easy to generalize to 2k points • Analogous to convex hull, which maps barycentric coordinates on a simplex • Reference for this construction?

Inversion Try to locate * = (d1, d2 ) on coordinate grid d2  d1

Sequential bisection d2   d1

Sequential bisection d2    d1

Sequential bisection  d2    d1

Sequential bisection  d2     d1

Sequential bisection  d2      d1

… may fail to converge d2    d1

Tempered sequential bisection o d2     d1

Tempered sequential bisection o d2 o      d1

Tempered sequential bisection o d2 o o       d1 prove that this converges...

Priority Scheduling: An Application for the Permutahedron

Priority Scheduling: An Application for the Permutahedron

Presentation Transcript

Coordinated Scheduling: A Mechanism for Efficient Multi-Node Communication

CPU Scheduling, Part Deux

Scheduling Shop Floor

Linux Kernel Development

Priority Scheduling in Wireless Ad Hoc Networks

Chapter 6 Dynamic Priority Servers

Scheduling with Priority Lists

CSE 522 Real-Time Scheduling (4)

UNIT - 7 SCHEDULING

OPERATING SYSTEMS CPU SCHEDULING

CMT603

Scheduling

CPU Scheduling

Scheduling policies for real-time embedded systems

Scheduling Policies

Chapter 6: CPU Scheduling

8 The Mathematics of Scheduling

8 The Mathematics of Scheduling

Providing RR Scheduling in MicroC/OS-II

UNIX

Process Scheduling in Linux (Chap.7 in Understanding the Linux Kernel)