320 likes | 436 Views
Scheduling for Performance. UMass-Boston Ethan Bolker April 21, 1999. Acknowledgements. Joint work with Jeff Buzen (BMC Software) BMC Software Dan Keefe Yefim Somin Chen Xiaogang (oliver@cs.umb.edu). Outline. Impossibly much to cover Performance metrics for workloads
E N D
Scheduling for Performance UMass-Boston Ethan Bolker April 21, 1999
Acknowledgements • Joint work with Jeff Buzen (BMC Software) • BMC Software • Dan Keefe • Yefim Somin • Chen Xiaogang (oliver@cs.umb.edu) Scheduling for Performance
Outline • Impossibly much to cover • Performance metrics for workloads • Beyond priorities • Modeling. Degradation as a performance metric • Conservation laws and the permutahedron • Specifying response times (IBM goal mode) • Specifying CPU shares (Sun Fair Share) • Priority distributions • Work in progress Scheduling for Performance
Workload Performance Metrics • Transaction (open) workload: jobs arrive at random from an external source • web or database server, eris with many interactive users • inputs: job arrival rate (throughput), service time • performance metric: response time • Batch (closed) workload: jobs always waiting (latent demand) • weather prediction, data mining • input: job service time • performance metrics: response time, throughput Scheduling for Performance
Beyond priorities • User wants performance assurance response time (open wkls), throughput (closed wkls) • Single workload: performance depends on resources available (CPU, IO, network) • Multiple workloads: prioritize resource access • Nice isn’t nice - hard to predict performance from priorities • Better: set performance goals, system tunes itself • Examples: IBM Goal Mode, Sun Fair Share, Eclipse, SMART, ... Scheduling for Performance
Tuning by Tinkering Workload Performance (Response Time) Administrator Priority Assignments
Scheduling software Scheduling for Performance Administrator Performance Goals rarely change measure frequently Workload Performance (Response Time) Priority Assignments
Modeling • System is dynamic, state changes frequently • Model is a static snapshot, deals in averages and probabilities • Can ask “what if?” inexpensively • Modeler’s measure of performance: degradation = (elapsed time)/(service time) • deg 1, deg = 1 when no contention (deg < 1 if parallel computation possible) • deg = n for n closed workloads (no priorities) Scheduling for Performance
Modeling One Open Workload • arrival rate (job/sec) (Poisson) • service time s (sec/job) (exponential dist’n) • utilization u = s, 0 u < 1 • Theorem: deg = 1/(1-u) • Often a useful guide even when hypotheses fail • depends only on u: many small jobs == few large jobs • faster system smaller s smaller u smaller deg • want u small when waiting is costly (telephones) • want u near 1 when system is costly (supercomputers) Scheduling for Performance
Multiple (open) workloads • Priority state: order workloads by priority (ties OK) • two workloads, 3 states: 12, 21, [12] • three workloads, 13 states: 123 (3! = 6 ordered states), [12]3 (3 of these), 1[23] (3 of these), [123] • n wkls, f(n) states (simplex lock combos), n! ordered • At each time instant, system runs in some state s, V(s) = vector of workload degradations • Measure or model V(s) (operational analysis) • p(s) = prob( state = s ) = fraction of time in state s • V = s p(s)V(s) (time average, convex combination) Scheduling for Performance
Two workloads (general case) wkl 2 degradation V(12) (wkl 1 high prio) achievable region V([12]) (no priorities) 0.5 V(12) + 0.5V(21) note: u1 < u2 V(21) wkl 1 degradation
Two workloads (conservation) wkl 2 degradation V(12) d1 = d2 V([12]) (no priorities, = degradation) 0.5 V(12) + 0.5V(21) achievable region u1 d1 + u2 d2 --------------- = constant avg degradation u1 + u2 V(21) wkl 1 degradation
Conservation • Theorem: For any priority assignments (1/util)wkls wutil(w)deg(w) = constant avg deg • Provable from some hypotheses, observable (false for printer queues) • For any set A of workloads imagine giving those workloads top priority discover (measure or model) avg degradation deg(A) (1/util(A))w A util(w)deg(w) deg(A) • These linear inequalities determine the convex achievable region Scheduling for Performance
Two workloads (conservation) u1 d1 + u2 d2 --------------- = constant avg degradation u1 + u2 V(12) achievable region d2 V([12])) d1 1/(1- u1 ) d2 1/(1- u2 ) V(21) d1
Three workloads d3 u1 d1 + u2 d2 + u3 d3 ----------------------- = avg degradation u1 + u2 + u3 V(123) V(213) d2 d1 Scheduling for Performance
Three workload permutahedron d2 d1 = d2 [13]2 312 132 3[12] 1[23] [123] {3} 321 123 [23]1 {12} [12]3 231 d2 = d3 213 2[13] d1 Scheduling for Performance
Four workload permutahedron 4! = 24 vertices (ordered states) 24 - 2 = 14 facets (proper subsets) (conservation constraints) 74 faces (states) Simplicial geometry and transportation polytopes, Trans. Amer. Math. Soc. 217 (1976) 138. Scheduling for Performance
Scheduling for Performance • Administrator specifies goals - e.g. degradations • Software determines priorities, trying to meet goals • Model maps goals to achievable degradations workload performance goals achievable region Scheduling for Performance
IBM OS390 Goal Mode Administrator specifies workload degradation goals wkl 2 degradation too generous achievable region too ambitious wkl 1 degradation
Modeling Goal Mode • Find right point in permutahedron for given V • Linear programming solution (Coffman & Mitrani) • Algorithm modeling problem more closely: for each subset A of workloads scale(A) = factor to force conservation true for A for each workload w scale(w) = min { scale(A) | scale(A) < 1 && w A } V(w) *= scale // inequalities now OK, scale back to p’hedron if necessary • O(2n), fast enough, conjecture (2n) • Refinements for workload importance Scheduling for Performance
SUN SRM (Solaris Resource Manager) • Administrator specifies workload CPU shares • Share f (0 < f < 1) means wkl guaranteed fraction f of CPU when it’s on run queue, can get more if no competition • Share = utilization only for closed workloads • Model: f1 = 1, f2 = f3 = … = 0 means wkl 1 has preemptive highest priority • Two wkls: V = f1 V(12) + f2 V(21) Scheduling for Performance
Map Shares to Degradations • Three (n) workloads f1 f2 f3 weight(123) = ------------------------------ (f1 + f2 +f3) (f2 +f3) (f3) V = ordered states s weight(s) V(s) • Theorem: weights sum to 1 • interesting identity generalizing adding fractions • prove by induction, or by coupon collecting • O(n!), (n!), fast enough for n < 9 (12) Scheduling for Performance
Three workload example Scheduling for Performance
Map Shares to Degradations • Normalize: f1 + f2 +f3 = 1 (barycentric coordinates) f1 = 1 achievable region f1 = 0 Scheduling for Performance
Experimental results for 3 workloads Scheduling for Performance
Mapping a triangle to a hexagon f1 = 1 f2 = 0 [13]2 312 132 3[12] 1[23] f2 = 1 f1 = 0 {3} 321 123 [23]1 {12} [12]3 wkl 1 high priority 231 213 2[13] wkl 1 low priority Scheduling for Performance
Map Goals to Shares • For open workloads, specifying shares is as as unintuitive as specifying priorities • Specify degradation goals • Map to achievable region • Reverse map from achievable region to shares: do guess shares // bisection argument compute degradations until error is acceptably small • 10 * O(n!) is good to 1% Scheduling for Performance
Map degradations to priorities • Real system works with priorities • pdist(w,p) = prob( wkl w at prio p) = time fraction pdist space (dim n(n-1) achievable region (dim n-1) Scheduling for Performance
Pdists to degradations and back d2 6 pieces, each combinatorially a square d1 = d2 1[23] [123] 123 [12]3 d2 = d3 d1 Scheduling for Performance
Pdists to degradations and back 1 0 0 0 .5 .5 0 .5 .5 .33 .33 .33 .33 .33 .33 .33 .33 .33 1[23] [123] 1 0 0 0 1 0 0 0 1 .5 .5 0 .5 .5 0 0 0 1 123 [12]3 Scheduling for Performance
Work in progress • Model mixed open and closed workloads • Prove algorithms correct • Solaris benchmark studies (under way) • OS390 validation - does data exist? • Write the paper ... • Build a product for IBM/Sun/BMC customers Scheduling for Performance