Scheduling for Performance

Scheduling for Performance UMass-Boston Ethan Bolker April 21, 1999

Acknowledgements • Joint work with Jeff Buzen (BMC Software) • BMC Software • Dan Keefe • Yefim Somin • Chen Xiaogang (oliver@cs.umb.edu) Scheduling for Performance

Outline • Impossibly much to cover • Performance metrics for workloads • Beyond priorities • Modeling. Degradation as a performance metric • Conservation laws and the permutahedron • Specifying response times (IBM goal mode) • Specifying CPU shares (Sun Fair Share) • Priority distributions • Work in progress Scheduling for Performance

Workload Performance Metrics • Transaction (open) workload: jobs arrive at random from an external source • web or database server, eris with many interactive users • inputs: job arrival rate (throughput), service time • performance metric: response time • Batch (closed) workload: jobs always waiting (latent demand) • weather prediction, data mining • input: job service time • performance metrics: response time, throughput Scheduling for Performance

Beyond priorities • User wants performance assurance response time (open wkls), throughput (closed wkls) • Single workload: performance depends on resources available (CPU, IO, network) • Multiple workloads: prioritize resource access • Nice isn’t nice - hard to predict performance from priorities • Better: set performance goals, system tunes itself • Examples: IBM Goal Mode, Sun Fair Share, Eclipse, SMART, ... Scheduling for Performance

Tuning by Tinkering Workload Performance (Response Time) Administrator Priority Assignments

Scheduling software Scheduling for Performance Administrator Performance Goals rarely change measure frequently Workload Performance (Response Time) Priority Assignments

Modeling • System is dynamic, state changes frequently • Model is a static snapshot, deals in averages and probabilities • Can ask “what if?” inexpensively • Modeler’s measure of performance: degradation = (elapsed time)/(service time) • deg  1, deg = 1 when no contention (deg < 1 if parallel computation possible) • deg = n for n closed workloads (no priorities) Scheduling for Performance

Modeling One Open Workload • arrival rate  (job/sec) (Poisson) • service time s (sec/job) (exponential dist’n) • utilization u = s, 0  u < 1 • Theorem: deg = 1/(1-u) • Often a useful guide even when hypotheses fail • depends only on u: many small jobs == few large jobs • faster system  smaller s  smaller u  smaller deg • want u small when waiting is costly (telephones) • want u near 1 when system is costly (supercomputers) Scheduling for Performance

Multiple (open) workloads • Priority state: order workloads by priority (ties OK) • two workloads, 3 states: 12, 21, [12] • three workloads, 13 states: 123 (3! = 6 ordered states), [12]3 (3 of these), 1[23] (3 of these), [123] • n wkls, f(n) states (simplex lock combos), n! ordered • At each time instant, system runs in some state s, V(s) = vector of workload degradations • Measure or model V(s) (operational analysis) • p(s) = prob( state = s ) = fraction of time in state s • V = s p(s)V(s) (time average, convex combination) Scheduling for Performance

Two workloads (general case) wkl 2 degradation V(12) (wkl 1 high prio)  achievable region V([12]) (no priorities)   0.5 V(12) + 0.5V(21) note: u1 < u2  V(21) wkl 1 degradation

Two workloads (conservation) wkl 2 degradation V(12)  d1 = d2 V([12]) (no priorities, = degradation)   0.5 V(12) + 0.5V(21) achievable region u1 d1 + u2 d2 --------------- = constant avg degradation u1 + u2  V(21) wkl 1 degradation

Conservation • Theorem: For any priority assignments (1/util)wkls wutil(w)deg(w) = constant avg deg • Provable from some hypotheses, observable (false for printer queues) • For any set A of workloads imagine giving those workloads top priority discover (measure or model) avg degradation deg(A) (1/util(A))w A util(w)deg(w)  deg(A) • These linear inequalities determine the convex achievable region Scheduling for Performance

Two workloads (conservation) u1 d1 + u2 d2 --------------- = constant avg degradation u1 + u2 V(12)  achievable region d2 V([12]))  d1  1/(1- u1 ) d2  1/(1- u2 )  V(21) d1

Three workloads d3 u1 d1 + u2 d2 + u3 d3 ----------------------- = avg degradation u1 + u2 + u3 V(123)   V(213) d2 d1 Scheduling for Performance

Three workload permutahedron d2 d1 = d2 [13]2 312 132 3[12] 1[23] [123] {3} 321 123 [23]1 {12} [12]3 231 d2 = d3 213 2[13] d1 Scheduling for Performance

Four workload permutahedron 4! = 24 vertices (ordered states) 24 - 2 = 14 facets (proper subsets) (conservation constraints) 74 faces (states) Simplicial geometry and transportation polytopes, Trans. Amer. Math. Soc. 217 (1976) 138. Scheduling for Performance

Scheduling for Performance • Administrator specifies goals - e.g. degradations • Software determines priorities, trying to meet goals • Model maps goals to achievable degradations workload performance goals achievable region Scheduling for Performance

IBM OS390 Goal Mode Administrator specifies workload degradation goals wkl 2 degradation too generous  achievable region  too ambitious  wkl 1 degradation

Modeling Goal Mode • Find right point in permutahedron for given V • Linear programming solution (Coffman & Mitrani) • Algorithm modeling problem more closely: for each subset A of workloads scale(A) = factor to force conservation true for A for each workload w scale(w) = min { scale(A) | scale(A) < 1 && w A } V(w) *= scale // inequalities now OK, scale back to p’hedron if necessary • O(2n), fast enough, conjecture (2n) • Refinements for workload importance Scheduling for Performance

SUN SRM (Solaris Resource Manager) • Administrator specifies workload CPU shares • Share f (0 < f < 1) means wkl guaranteed fraction f of CPU when it’s on run queue, can get more if no competition • Share = utilization only for closed workloads • Model: f1 = 1, f2 = f3 = … = 0 means wkl 1 has preemptive highest priority • Two wkls: V = f1 V(12) + f2 V(21) Scheduling for Performance

Map Shares to Degradations • Three (n) workloads f1 f2 f3 weight(123) = ------------------------------ (f1 + f2 +f3) (f2 +f3) (f3) V = ordered states s weight(s) V(s) • Theorem: weights sum to 1 • interesting identity generalizing adding fractions • prove by induction, or by coupon collecting • O(n!), (n!), fast enough for n < 9 (12) Scheduling for Performance

Three workload example Scheduling for Performance

Map Shares to Degradations • Normalize: f1 + f2 +f3 = 1 (barycentric coordinates) f1 = 1 achievable region f1 = 0 Scheduling for Performance

Experimental results for 3 workloads Scheduling for Performance

Mapping a triangle to a hexagon f1 = 1 f2 = 0 [13]2 312 132 3[12] 1[23] f2 = 1 f1 = 0 {3} 321 123 [23]1 {12} [12]3 wkl 1 high priority 231 213 2[13] wkl 1 low priority Scheduling for Performance

Scheduling for Performance

Map Goals to Shares • For open workloads, specifying shares is as as unintuitive as specifying priorities • Specify degradation goals • Map to achievable region • Reverse map from achievable region to shares: do guess shares // bisection argument compute degradations until error is acceptably small • 10 * O(n!) is good to 1% Scheduling for Performance

Map degradations to priorities • Real system works with priorities • pdist(w,p) = prob( wkl w at prio p) = time fraction pdist space (dim n(n-1) achievable region (dim n-1) Scheduling for Performance

Pdists to degradations and back d2 6 pieces, each combinatorially a square d1 = d2 1[23] [123] 123 [12]3 d2 = d3 d1 Scheduling for Performance

Pdists to degradations and back 1 0 0 0 .5 .5 0 .5 .5 .33 .33 .33 .33 .33 .33 .33 .33 .33 1[23] [123] 1 0 0 0 1 0 0 0 1 .5 .5 0 .5 .5 0 0 0 1 123 [12]3 Scheduling for Performance

Work in progress • Model mixed open and closed workloads • Prove algorithms correct • Solaris benchmark studies (under way) • OS390 validation - does data exist? • Write the paper ... • Build a product for IBM/Sun/BMC customers Scheduling for Performance

Scheduling for Performance

Scheduling for Performance

Presentation Transcript

CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines

Scheduling for parallelism

Scheduling for Success

CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines

Scheduling for 2014

Scheduling for Change

Scheduling for Success

SCHEDULING FOR SUCCESS

Scheduling for Success

Scheduling and Performance Issues for Programming using OpenMP

GHS: A Performance Prediction and Task Scheduling System for Grid Computing

Two-Gyro Performance: Scheduling and Acquisitions

Overview of a Performance Evaluation System for Global Computing Scheduling Algorithms

Bricks: A Performance Evaluation System for Scheduling Algorithms on the Grids

Distributed Process Scheduling: A System Performance Model

Performance-responsive Scheduling for Grid Computing

Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems

EMAN, Scheduling, Performance Prediction, and Virtual Grids

CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines

Scheduling for OFDMA

Distributed Process Scheduling: A System Performance Model