460 likes | 605 Views
Priority Scheduling: An Application for the Permutahedron. Ethan Bolker UMass-Boston BMC Software AMS Toronto meeting September 24, 2000. Plan. Brief introduction to queueing theory Priority scheduling Conservation laws and the permutahedron Specifying CPU shares
E N D
Priority Scheduling: An Application for the Permutahedron Ethan Bolker UMass-Boston BMC Software AMS Toronto meeting September 24, 2000
Plan • Brief introduction to queueing theory • Priority scheduling • Conservation laws and the permutahedron • Specifying CPU shares interesting pictures and open questions References: www.cs.umb.edu/~eb/goalmode Acknowledgements: Jeff Buzen, Yiping Ding, Dan Keefe, Oliver Chen, Aaron Ball, Tom Larard
Queueing theory • Workload: stream of jobs visiting a server (ATM, time shared CPU, printer, …) • Jobs queue when server is busy • Input: • Arrival rate: job/sec • Service demand: s sec/job • Performance metrics: • Utilization: u = s (must be 1) • Response time: r = ??? • Degradation: d = r/s • Queue length: q = r (Little’s law)
Response time computations • r, d, q measure queueing delay r s (d 1), unless parallel processing possible • Randomness really matters r = s (d = 1) if arrivals scheduled (best case, no waiting) r >> s for bulk arrivals (worst case, maximum delays) • Theorem. d = 1/(1- u) if arrivals are Poisson and service is exponentially distributed (M/M/1). r = s/(1- u) (think virtual server with speed 1-u ) q = u/(1- u) (convention: job in service is on queue)
M/M/1 • Essential nonlinearity often counterintuitive • at u = 90% average queue length is 0.9/(1-0.9) = 9, • average response time is s/(1-0.9) = 10s, • but 1 customer in 10 has no wait at all (10% idle time) • A useful guide even when hypotheses fail • accurate enough ( 20%) for real computer systems • d depends only on u: many small jobs have same impact as few large jobs • faster system smaller s smaller u r = s/(1-u) double win: less service, less wait • waiting costly, server cheap (telephones): want u 0 • server costly (doctors): want u 1 but scheduled
Multiple Job Streams • Multiple workloads, utilizations u1, u2, … • U = ui < 1 All degradations equal: di = 1/(1-U) • Suppose priority scheduling possible Study degradation vector V = (d1, d2, …)
Priority Scheduling • Priority state: order workloads by priority (ties OK) • two workloads, 3 states: 12, 21, [12] • three workloads, 13 states: • 123 (6 = 3! of these ordered states), • [12]3 (3 of these), • 1[23] (3 of these), • [123] (1 state with no priorities) • n wkls, f(n) states, n! ordered (simplex lock combos) • p(s) = prob( state = s ) = fraction of time in state s • V(s) = degradation vector when state = s (measure this, or compute it using queueing theory) • V = s p(s)V(s) (time avg is convex combination) • Achievable region is convex hull of vectors V(s)
Two workloads d1 = d2 d2 V(12) (wkl 1 high prio) V([12]) (no priorities) achievable region V(21) d1
Two workloads d1 = d2 d2 V(12) (wkl 1 high prio) V([12]) (no priorities) 0.5 V(12) + 0.5V(21) V([12]) V(21) d1
Two workloads d1 = d2 d2 V(12) (wkl 1 high prio) V([12]) (no priorities) note: u1 < u2 wkl 2 effect on wkl 1 large V(21) d1
Conservation • No Free Lunch Theorem. Weighted average degradation is constant, independent of priority scheduling scheme: i (ui /U) di = 1/(1-U) • Provable from some hypotheses • Observable in some real systems • Sometimes false: shortest job first minimizes average response time (printer queues, supermarket express checkout lines)
Conservation • For any proper set A of workloads Imagine giving those workloads top priority. Then can pretend other wkls don’t exist. In that case i A (ui /U(A)) di= 1/(1-U(A)) When wkls in A have lower priorities they have higher degradations, so in general i A (ui /U(A)) di 1/(1-U(A)) • These 2n -2 linear inequalities determine the convex achievable regionR • R is a permutahedron: only n! vertices
Two workload permutahedron d2 u1d1 + u2d2 = U/(1-U) d1
Two workload permutahedron d2 u1d1 + u2d2 = U/(1-U) d2 1/(1- u2 ) V(21) d1
Two workload permutahedron d2 V(12) achievable region u1d1 + u2d2 = U/(1-U) d2 1/(1- u2 ) V(21) d1 1/(1- u1 ) d1
Three workload permutahedron d3 u1d1 + u2d2 + u3d3 = U/(1-U) V(123) V(213) d2 d1
Four workload permutahedron 4! = 24 vertices (ordered states) 24 - 2 = 14 facets (proper subsets) (conservation constraints) 74 faces (states) Simplicial geometry and transportation polytopes, Trans. Amer. Math. Soc. 217 (1976) 138.
Scheduling for performance • Administrator specifies performance goals • desired degradations (IBM OS/390) (not today) • CPU shares (UNIX offerings from HP, IBM, Sun) • Operating system dispatches jobs in an attempt to meet goals • Model predicts degradations by constructing map workload performance goals permutahedron
Specifying CPU shares • Administrator specifies workload CPU shares • Share f (0 < f < 1) means workload guaranteed fraction f of CPU when at least one of its jobs is queued for service, can get more if some competition is absent • share utilization • share cap • share should be renamed guarantee
Map shares to degradations- two workloads - • Suppose f1 and f2 > 0 , f1 + f2 = 1 • Model: System operates in state • 12 with probability f1 • 21 with probability f2 (independent of who is on queue) • Average degradation vector: V = f1 V(12) + f2 V(21)
Map shares to degradations- three (n) workloads - f1 f2 f3 prob(123) = ------------------------------ (f1 + f2 +f3) (f2 +f3) (f3) • Theorem: These n! probabilities sum to 1 • interesting identity generalizing adding fractions • prove by induction, or by coupon collecting • V = ordered states s prob(s) V(s) • O(n!), (n!), good enough for n 9 (12) • Searching for fast (approximate) algorithm ...
Map shares to degradations(geometry) • Interpret shares as barycentric coordinates in the n-1 simplex • Study the geometry of the map from the simplex to the n-1 dimensional permutahedron • Easy when n=2: each is a line segment and map is linear
Mapping a triangle to a hexagon f3 = 1 f1 = 0 312 132 f1 = 1 M f3 = 0 321 123 wkl 1 high priority 213 231 wkl 1 low priority
Mapping a triangle to a hexagon f1 = 0 f1 = 1 {23}
Implementing fair share scheduling • Actual Sun/solaris implementation is subtle • HP and IBM are black boxes (for me) • Stochastic solution: randomly choose queued job to dispatch (implement the model rather than model an implementation) • May require prior computation of priodist(w, p) = prob(wkl w runs at prio p) • workload priority probabilities, not state probabilities
Priority distributions • Given degradations, compute a priodist • A priodist is an nn matrix with row sums 1 • {priodists} = cartesian product of n n-simplices • Map is surjective, not injective • Look for a well behaved inverse image priodist space (dim n(n-1)) permutahedron (dim n-1)
Three workload permutahedron d2 d1 = d2 [13]2 312 132 3[12] 1[23] [123] 321 123 [23]1 [12]3 231 d2 = d3 213 2[13] d1 d1 = d3
… dissected into 3! quadrilaterals d2 d1 = d2 1[23] [123] 123 [12]3 d2 = d3 d1
… each mapped to from a skew quadrilateral of priodists 1 0 0 0 .5 .5 0 .5 .5 .33 .33 .33 .33 .33 .33 .33 .33 .33 P[123] P1[23] 1[23] [123] (x,y) P123 P[12]3 123 [12]3 .5 .5 0 .5 .5 0 0 0 1 1 0 0 0 1 0 0 0 1 (x,y) xyP123 + x(1-y) P1[23] + (1-x)yP[12]3 + (1-x)(1-y) P[123] degradation vector in this corner of permutahedron
Skew quadrilaterals • Given 4 points P00, P01, P10, P11 Rm , map unit square: (x,y) xyP00 + x(1-y) P01+ (1-x)yP10 + (1-x)(1-y) P11 • Easy to generalize to 2k points • Analogous to convex hull, which maps barycentric coordinates on a simplex • Reference for this construction?
Inversion Try to locate * = (d1, d2 ) on coordinate grid d2 d1
Sequential bisection d2 d1
Sequential bisection d2 d1
Sequential bisection d2 d1
Sequential bisection d2 d1
Sequential bisection d2 d1
… may fail to converge d2 d1
Tempered sequential bisection o d2 d1
Tempered sequential bisection o d2 o d1
Tempered sequential bisection o d2 o o d1 prove that this converges...